SMB3 kernel status
This page describes the plan, design and work in progress of the efforts to implement SMB3 and later dialects in the kernel cifs/smb3 client (cifs.ko).
The minimum version for use of the Linux kernel SMB3 support is kernel version 3.12 (or backport of cifs.ko module version 2.02 or later) but kernel version 4.11 or later is recommended due to the addition of the SMB3 share encryption security feature. The default dialect was changed to SMB2.1 or later in the 4.14 Kernel.
Kernel client features/fixes by release can be found at https://wiki.samba.org/index.php/LinuxCIFSKernel
- SMB 2.0 (SMB2.02 dialect) was introduced with Windows Vista/2008 and includes a useful data integrity feature ("durable file handles"). Implementation in the kernel client is complete.
- SMB 2.1 was introduced with Windows 7/Windows 2008R2.
- Basic support for SMB 2.1 was added in kernel version 3.07
- Features done:
- multi credit/large MTU
- leases, resilient file handles
- Features TODO:
- branch cache
- SMB 3 (previously known as SMB2.2 dialect) was introduced with Windows 8 and Windows Server 2012. SMB3 support in the kernel was much improved in kernel version 3.12. SMB3 dialect defines the following features:
- Basic support for SMB3 is included, as are security improvements (improved faster more secure packet signing, secure negotiate protection against downgrade attacks
- In addition the client can now do network interface discovery (a new FSCTL),
- Still need to do:
- directory leases
- multi channel
- witness notification protocol (a new RPC service)
- Support for a misc. set of loosely related storage features for virtualization (new fsctls, T10 block copy offload, TRIM etc.)
- remote shadow copy support
- branch cache v2
- SMB3.02 was introduced in Windows 8.1 (Windows 'Blue') and Windows Server 2012 R2. Among the new protocol features are those particularly useful for virtualization (HyperV):
- SMB3.02 dialect is not yet negotiated by Samba servers
- SMB3.02 dialect can be requested by the Linux cifs client ("vers=3.02" on mount) but the new optional features, unique to SMB3.02, are not requested.
- Unbuffered I/O flags (ie a 'no cache flag' which may be sent on read or write)
- New RDMA remote invalidate flag
- MS-RSVD (a set of remoteable FSCTLs that improve "SCSI over SMB3")
- Asymmetric Shares (extensions to Witness protocol to allow moving users of one share to a different server, eg for load balancing or maintenance - previously witness protocol could only do this on a per server rather than per-share basis).
- SMB3.1.1 was introduced in Windows 10. Among the new features defined:
- Improvements to security negotiation ("negotiate contexts") and dynamically selectable Cipher and Hash Algorithms.
- Also the cifs client has added support for the new FSCTL for server side copying of file ranges (DUPLICATE_EXTENTS)
Prerequisite / accompanying work
implement durable open and durable reconnect with reopening files
Durable handle cross-node
No mechanism currently in place to reconnect to server other than the one that you have mounted against (e.g. we do not fail over to an alternate DFS referral if two servers export the same DFS path, and we do not support witness protocol failover so can't reconnect to a different server yet).
Supported by mount option cache=strict, or disabling oplocks - but no Linux mechanism exists for control of buffering on a per-write granularity
Multi Credit / Large MTU
We request leases on opens (unless "cache=none" mount option is used, or unless oplock is disabled in the module parameters when cifs.ko is loaded). There is no mechanism to do request lease upgrades currently. Leases continue to provide significant performance benefits in SMB3, just as oplock did in the cifs protocol.
Resilient File Handles
mount with option "resilienthandles" (starting with 4.4 kernel)
- Encryption and improved packet signing (in progress)
- Secure negotiate (complete)
Only implemented (requested by default) for the root directory of the mount currently. This does speed up revalidating path names since the root directory existing etc. no longer needs to be requeried when opening a file or subdirectory.
- Directory leases are a mechanism for caching metadata read operations/directory listings of child objects of a directory (File leases are a mechanism for caching the data operations.)
- The client maintains separate caches for each user context, but still using just one lease to invalidate the cache. This is needed because access based enumeration may cause different directory listing depending on the user context.
Persistent File Handles
Persistent file handles are a like durable file handles with strong guarantees. They are requested with the durable v2 create request blob with the persistent flag set to true. Persistent file handles are supported with mount option "persistenthandles" (starting with Linux kernel 4.4), but implementation is not quite complete as the "ChannelSequenceNumber" is not yet set on reconnect.
Multichannel allows use of multiple network interfaces on the server (and optionally on the client) to significantly improve available network bandwidth. Experimental multichannel support was added to the kernel client in the 5.5 kernel, but it was limited in scalability for large i/o due to locking. Improvements to locking to allow significantly more scalability for large file i/o will be in the 5.8 kernel.
Witness Notification Protocol
The witness service is an RPC service that allows a client to be actively notified about the state change of resources. The client asks the node it is connected to for a list of interfaces and registers itself on a different node with the witness service for notification about a resource, which might be a network name (typically the DNS name of the file server), or an interface group and an IP address. Afterward it can request notification.
Client Side Witness Daemon
Gunther has a prototype of this, but it is not packaged with cifs-utils yet
- we need a tool to display the witness registrations
- we need a tool to move client to a different node
aka SMB 3.0 over RDMA
Implemented in the 4.16 kernel and marked non-experimental in the 5.3 kernel. It provides significant performance benefits (and CPU reduction) for large i/o.
SMB-Direct backend for smb_transport abstraction
RDMA Read/Write support in the client
Provides significant benefit in large i/o and reduction in CPU utilization.
Remote Shadow Copy (FSRVP)
Not an SMB 3.0 specific feature per se.
- Need to add:
- add rpcclient support for FSRVP commands
- implement user interface (/proc or /sys or ioctl) and tools for this.
Branch Cache v2
Branch Cache is a wide area network caching protocol implemented in Windows 7 and later. It allows the server to return hashes of the data to the client, and then the client can use these hashes to request copies of the actual data from nearby systems, optimizing network bandwidth. Although Branch Cache is not SMB3 specific (e.g. HTTP etc) it is useful in conjunction with SMB2.1 and SMB3 file serving to improve WAN performance and better optimize bandwidth usage. See MS-PCCRC, MS-PCCRD, MS-PCCRR.
See http://www.snia.org/sites/default/files2/SDC2013/presentations/SMB3/DavidKruse_SMB3_Update.pdf SMB3.02 is very similar to SMB3 but with some optional features added. Note that the Linux CIFS client can negotiate SMB3.02 dialect (with these optional features disabled) by specifying vers=3.02 on mount. Samba server can not currently negotiate SMB3.02 as it does not have support for the new READ/WRITE flags (and the Witness protocol improvements for SMB3.02 are not possible until the corresponding prerequisite optional SMB3.0 features that they are based on are added).
Currently cifs.ko can negotiate SMB3.02 dialect (vers=3.02) but does not request the optional features listed below so a vers=3.02 mount acts much like a vers=3.0 mount.
SMB Direct Remote Invalidation. Improves performance.
New ReadWrite Flags
SMB2_READFLAG_UNBUFFERED and SMB2_WRITEFLAG_UNBUFFERED allow the client to indicate whether or not any particular individual i/o request (read or write) should be cached by the server or not. There are no interfaces in the Linux kernel VFS for per i/o flags yet, so support for this on the wire would require a private ioctl on the client.
The Witness protocol can now signal to Windows clients to 'move' from one share to another, to allow more flexible migration, allowing taking a volume offline without taking the whole server down, with applications continuing to run even as the storage which that application uses is moved. Previous versions of the witness protocol allowed users of one server to be moved to another server, but this allows more granular movement - those using a particular share now can be redirected on the fly to another share.
Cluster-Wide Durable Handles
Work in progress