Linux SMB2 client design
In order to make coding easier, the following describes various design considerations for the Linux SMB2 client. The SMB2 network file system protocol is the successor to the SMB/CIFS protocol, and is the default network file protocol for various operating systems. Although most, if not all, current servers and NAS appliances continue to support the CIFS network file system protocol, the improved scalability limits, performance and features of SMB2 make it a high priority to add SMB2 client support to the Linux kernel. SMB2 also includes some new features such as symlink support, which are not part of the original CIFS protocol (although are available in CIFS POSIX Protocol Extensions which Samba and the Linux CIFS kernel client, among others, implement).
Although alternative network file systems (such as NFSv4) and cluster file systems have been implemented for Linux, only a small percentage of servers and NAS appliances implement those protocols, limiting the usefulness of other (non-SMB/CIFS or SMB2) network file system clients. With the release of detailed protocol documentation by Microsoft over the past few years, SMB2 (and related subprotocols such as MS-DFS) is likely to become broadly adopted.
Linux has a VFS model which is described in detail in Documentation/filesystems/vfs.txt. Additional references are included in http://pserver.samba.org/samba/ftp/cifs-cvs/ols2007-fs-tutorial-smf.pdf
- Support SMB2 mounts from Linux (kernel) to common Windows servers including Vista and Windows 2008, NAS appliances, and Samba servers which support SMB2.
- In developing the smb2 module, minimize risks of breaking cifs module in mainline Linux kernel, as cifs module is considered stable and used by many. As with most other examples among Linux kernel file systems, this is easiest to achieve by making the smb2 module distinct from cifs. In addition making the cifs and smb2 modules distinct is indicated by the very large difference between the two protocols. SMB2 is not a "dialect" of SMB/CIFS but an almost completely different protocol which: shares a subset of ntfs information levels, shares the use of UCS-2 strings, one class of error messages (NT Status codes) and uses the same tcp port.
- SMB2 is a much smaller protocol than SMB/CIFS, and mostly offers more features, although SMB2 currently lacks "POSIX Extensions" (Unix Extensions).
- In the longer term duplicate code can and should be moved to the VFS, or to a common helper module, as code stabilizes and risks can be minimized.
- Implement code in critical paths (e.g. memory management, writepages, network reconnection) in kernel (rather than via upcall), with as few userspace dependencies as possible, in order to avoid deadlocks in low memory situations.
- Obvious design problems ("lessons learned") with the cifs client should be addressed where possible, and Kernel style considerations should be more strictly followed (removing "camel case" where possible, and where it does not conflict with the names of structures in the protocol standards documents).
- Variable and (network protocol) structure element names should correspond to those in the protocol standard where possible, to ease long term maintenance and readability.
- Support at least the same feature set that the Linux cifs kernel client does, unless the feature is obsolete (e.g. nonblocking tcp send mount option, or lanman authentication) or unimplementable currently (mount options dealing with the cifs unix/posix extensions).
Coding Requirements and Build
The smb2 module should build cleanly with as few as possible warnings in order for us to quickly find errors in newly introduced code. The presence of "expected" build warnings would increase the probability of unnoticed bugs creeping into the code, so:
1) Before submitting patches for inclusion in the SMB2 project, all patches must pass the kernel scripts/checkpatch.pl script with the following exceptions: typedefs in the protocol definitions are permitted (if needed to better match protocol definitions and structure names, e.g. in smb2pdu.h, to the documentation) Common sense should be used in interpreting checkpatch results if following checkpatch would make the code much harder to read. 2) The SMB2 client should pass
make modules CF="-D__CHECK_ENDIAN__"
with no warnings other than those which are out of our control (e.g. fmode is defined awkwardly in include/linux, outside of the smb2 code, and it is impossible to avoid warnings of certain usages of mode flags). Even for those which are due to incorrect definitions in include/linux, if they can be fixed safely in common Linux headers, we should do that.
Like most Linux file systems the Linux SMB2 kernel client has few external interfaces, but many internal kernel (VFS) entry points.
- The mount(8) command interacts with the helper utility mount.smb2 if present, which resolves host names to ip addresses if needed, and invokes the system call mount(2) which calls into smb2 at the VFS entry point "smb2_get_sb." The mount command and smb2 helper utilities do little parsing of smb2 mount options in order to avoid complex kernel/user-space dependencies (which used to be a problem in a few older file systems), but host name parsing and address resolution is awkward to do in kernel, so is done in user space.
- The umount(8) command interacts with umount.smb2 if present. If it turns out that "user unmounts" now can be implemented without umount.smb2, then this helper utility (umount.smb2) can be removed. Unmount enters the smb2 module from the VFS at the entry point "smb2_put_sb."
- Various system calls in libc invoke the kernel VFS and enter smb2 through functions exported in the usual way (as done by other Linux kernel filesystems) through the function tables "smb2_inode_operations" and "smb2_file_operations" and "smb2_dentry_operations" and "smb2_super_operations."
- The proc interface allows smb2 configuration options to be displayed and modified, and allows runtime debugging information to be retrieved, and allows the display of statistics and mount options. Some proc filenames are standard across all filesystems (e.g. /proc/mounts, /proc/<pid>/mounts, /proc/<pid>/mountstats while others are stored in smb2 specific locations in /proc/fs/smb2. The file layout under /proc/fs/smb2 is intended to match as closely that of /proc/fs/cifs as reasonably possible. Adding new smb2 proc entries which export information similar to entries in /proc/fs/nfs and /proc/fs/nfsfs could also be considered.
Although the SMB2 module, like most Linux file systems, is likely to be a small monolithic module (a single .ko file), there are multiple implied layers.
- The upper layer: The kernel VFS interface itself (struct inode_operations, struct file_operations etc.) is the natural top layer of the smb2 module, and defines a series of operations whose functions are named for the corresponding vfs operation with the prefix "smb2_" (e.g. "smb2_unlink"). To avoid having very large C files, the functions can be split across multiple files like fs/smb2/dir.c, file.c and inode.c depending on the type of object they act on. In the future if other implementations (e.g. CIFS) were merged into the same module, additional struct inode_operations (e.g. which would contain functions like "cifs_mkdir" vs. "smb2_mkdir") could be added.
- The "network file system protocol layer" defines the series of 15 or 20 needed SMB2 specific protocol operations abstractly, so that the upper layer does not need to know how to decode SMB2 headers and bodies. This is expected to be quite a bit different than the same layer in CIFS due to differences in structures, sizes (file identifiers) and since SMB2 is a "handle based" rather than "path name based" protocol (as CIFS mostly was). Although SMB2 does not yet have different protocol operations for different dialects (at least none which we need to worry about until SMB2 Posix Extensions are defined), SMB/CIFS did have multiple versions of certain functions due to dialect differences, but due to the need to do "legacy fallback" (on certain errors) they could not be dereferenced as dialect specific function pointers. If dialect specific "network file system protocol layer" functions (e.g. SMB2_read vs. SMB21_read or SMB2_posix_read) are introduced they should have identical calling conventions for consistency. This layer is implemented in fs/smb2/smb2pdu.c ("pdu" = "protocol data unit" functions) and smb2pdu.h (which contains the definitions for the SMB2 protocol on the wire). Since smb2 protocol frames are not marshaled or endian converted (smb2 is purely little endian) there is no need for an additional encoding layer (nothing like xdr is needed), and an additional layer here could make frame validation harder (since smb2pdu.h is defined carefully to match the specification and the "sparse" tool checks for format mismatches in the "network file system protocol layer").
- The "transport layer" for SMB2 and CIFS, unlike NFS or AFS, which each sit on complex RPC layers, is simply TCP. The original designers of SMB for simplicity made only two requirements on the networking layers below it: reliable transport, and framing (SMB/CIFS and SMB2 frames begin with a 4 byte length prefix). As a result, SMB2 can in theory run over any socket based transport interface, although TCP (whether with IPv4 or IPv6 addresses) is most common (the Netbios frames protocol is no longer used, but SCTP/RDMA or shared memory transports which expose a Linux kernel socket API may be easy to add in the future). To hide any complexities of kernel socket handling, all synchronous SMB2 functions (ie all functions except readpages, writepages, blocking byte range locks, and inotify) for simplicity use a "transceive" (ie send followed by wait for response) interface which takes a kvec (an io vector of pointers to the data to be sent) and a pointer to the credentials (per user session information) and socket. The asynchronous operations will of necessity need to use a more complex calling convention which is to be determined later. By using a synchronous "smb2_send_receive" wrapper function, the underlying kernel socket handling code can be changed without affecting any of the the rest of the module. Using a similar approach to cifs, but stripping out unneeded code, the transport handling code under "smb2_send_receive" was prototyped, and was reasonably small (under 1000 lines of code). It would be desirable to keep the transport handling code small to ease future maintenance, but if other kernel code (such as in /net) can with reasonable modifications be adapted for this purpose that would be ideal. Support for sending multiple SMB requests at one time, and support for asynchronous notification will be required, and the Linux kernel (and also the /net directory in particular) provide multiple examples (some using work queus, and some using kevents, and some using the NFS SunRPC task structs). In the meantime, the implementation of sockets and SMB2 signing is hidden from the other layers of the SMB2 code and can be modified without harm to the rest of the SMB2 module, and is expected to be relatively small (under 5% of the total code). The current prototype implementation of the "send" part of the transport layer is 400 lines of code in transport.c (somewhat similar to the cifs code, but smaller), and the socket setup and demultiplexing code is slightly larger and in connect.c. As there are only two exports in transport.c (smb2_sendrcv and smb2_sendrcv_norsp), there are few if any dependencies on transport.c and alternate implementations of the socket handling can be explored.
- String handling - SMB2 is Unicode (UCS2/UTF32) only, no more server codepages. String handling functions rely on versions Jeff Layton developed for CIFS, which convert to/from UCS2 from/to local codepage and optionally allocate memory for the string and calculate the length of the string in bytes. Since SMB2 frames are sent to the transport as an array of vectors, these string pointers are convenient to use, and path names are converted directly from dentries to Unicode. Eventually these string handling functions are expected to merge into common code in the VFS as they stabilize.
- Error mapping - SMB2 uses NT Status codes only (no more smb errors) so there is a very large mapping table for defining error names and the corresponding posix errors. As the SMB2 NT Status codes are more granular than POSIX errors, both POSIX errors and NT Status codes are passed back by many functions. As an alternative, POSIX errors could be embedded in NT Status codes, but that would increase the code length in the VFS layer (there are a few, easy to identify, places in the client, where the POSIX error is insufficient). The large error mapping table (NT Status to POSIX errors) may eventually be used by the cifs client as well (and move to common code), but cifs will still need to retain cifs specific mapping of older smb errors to posix errors somewhat limiting the savings.
- Authentication - Although SMB2 only supports SPNEGO encapsulated extended security blobs (either encapsulating Kerberos or encapsulating NTLMSSP in SPNEGO via ASN), and SMB2 does not support an equivalent of the many other authentication types which cifs had to support, they can share the same authentication upcall, and can share some of the code needed to build NTLMSSP blobs, and to calculate NTLMv2 blobs. For obvious security reasons, smb2 should not include support for weaker password hashes (weaker than NTLMv2) as common SMB2 servers support NTLMv2 (and in some cases Kerberos). In the long run, a new implementation of kernel ASN handling, perhaps leveraging the implementation in net/sunrpc), rather than always relying on an upcall (especially on reconnect in low memory situations), would be helpful, but in the interim the current cifs upcall approach (to build the ASN SPNEGO/Kerberos blob can be used.