Difference between revisions of "SMB3-Linux"

(POSIX vs Windows directory rename behaviour)
(add smb3 posix extension status)
Line 19: Line 19:
 
Current status:
 
Current status:
 
#POSIX mode bits: emulatable via the "cifsacl" (cifs.ko mount option for cifs which pulls them from the server's "RichACL" (NTFS/SMB3/NFSv4).  Using an approach similar to the "NFSv4 mode ACE" may be helpful as well.  Prototype not complete.  SMB3_SetACL and SMB3_GetACL worker functions for Linux's cifs.ko have been prototyped but not reviewed yet.
 
#POSIX mode bits: emulatable via the "cifsacl" (cifs.ko mount option for cifs which pulls them from the server's "RichACL" (NTFS/SMB3/NFSv4).  Using an approach similar to the "NFSv4 mode ACE" may be helpful as well.  Prototype not complete.  SMB3_SetACL and SMB3_GetACL worker functions for Linux's cifs.ko have been prototyped but not reviewed yet.
#POSIX file ownership (see above)
+
'''smbd status''': can be set on create via a create context (POSIX extension).
 +
#POSIX file ownership (see above).
 +
'''smbd status''': can be set on create via a create context (POSIX extension).
 
#symlinks: use the "mfsymlinks" approach used by Apple among others.  Implemented in cifs.ko.  Will be in kernel 3.18 and later.  Should be backportable to earlier kernels.
 
#symlinks: use the "mfsymlinks" approach used by Apple among others.  Implemented in cifs.ko.  Will be in kernel 3.18 and later.  Should be backportable to earlier kernels.
 +
'''smbd status''': looks like this hasn't changed.
 
#case sensitivity: Not available yet, requires extension to SMB3 OpenCreate call - a new "POSIX Create Context" has been proposed.
 
#case sensitivity: Not available yet, requires extension to SMB3 OpenCreate call - a new "POSIX Create Context" has been proposed.
 +
'''smbd status''': looks like this hasn't changed.
 
#mapping 7 reserved characters:  There are three ways to do this:  "POSIX Create Context" and Microsoft's "SFU" (SUA) mapping and Apple's "SFM" mapping.  The SFU mapping is available in CIFS (and SMB3 in 3.18) with the "mapchars" mount option but we plan to use the Apple ("SFM") mapping approach by default in 3.18 kernel and later (Samba requires the "vfs_fruit" module to implement the Apple mapping of the seven reserved characters).
 
#mapping 7 reserved characters:  There are three ways to do this:  "POSIX Create Context" and Microsoft's "SFU" (SUA) mapping and Apple's "SFM" mapping.  The SFU mapping is available in CIFS (and SMB3 in 3.18) with the "mapchars" mount option but we plan to use the Apple ("SFM") mapping approach by default in 3.18 kernel and later (Samba requires the "vfs_fruit" module to implement the Apple mapping of the seven reserved characters).
#mkfifo and mknod:  are emulated using the same approach that Microsoft SFU and others did.  Uses the "sfu" mount option (available in 3.18 kernel or later)
+
'''smbd status''': looks like this hasn't changed.
 +
#mkfifo and mknod:  are emulated using the same approach that Microsoft SFU and others did.  Uses the "sfu" mount option (available in 3.18 kernel or later).
 +
'''smbd status''': device nodes can be retrieved via POSIX info level (POSIX Extension).
 
#POSIX unlink and rename behavior.  Emulatable over SMB3 for most cases (using "delete on close" and using an approach like "nfs silly rename").  3.18 kernel will better handle these but "POSIX Create Context" are still likely to be required for a few use cases.
 
#POSIX unlink and rename behavior.  Emulatable over SMB3 for most cases (using "delete on close" and using an approach like "nfs silly rename").  3.18 kernel will better handle these but "POSIX Create Context" are still likely to be required for a few use cases.
 +
'''smbd status''': looks like this hasn't changed.
 
#POSIX Advisory byte range locks:  emulated via mandatory locks today, and can also be "local only" (with a cifs.ko mount option "nobrl").  Requires "POSIX Create Context"
 
#POSIX Advisory byte range locks:  emulated via mandatory locks today, and can also be "local only" (with a cifs.ko mount option "nobrl").  Requires "POSIX Create Context"
 
#stat (see above)
 
#stat (see above)
#statfs:  For the two fields which are not retrievable other ways (minor issue).  "POSIX Create Context" can be used
+
#statfs:  For the two fields which are not retrievable other ways (minor issue).  "POSIX Create Context" can be used.
#POSIX ACLs: Could be mapped to SMB3/NTFS RichACLs which are a superset of POSIX ACLs.  Also could be handled via "POSIX Create Context"
+
'''smbd status''': looks like this done.
#fallocate (partially implemented already) and also a few other new Linux syscalls which are not broadly implemented: more research needed
+
#POSIX ACLs: Could be mapped to SMB3/NTFS RichACLs which are a superset of POSIX ACLs.  Also could be handled via "POSIX Create Context".
 +
'''smbd status''': looks like this done.
 +
#fallocate (partially implemented already) and also a few other new Linux syscalls which are not broadly implemented: more research needed.
 +
'''smbd status''': looks like this hasn't changed.
 +
 
 +
== samba POSIX extension status ==
 +
 
 +
As of 24-05-2018 from JRA's master-smb2 branch:
 +
 
 +
=== Negotiate Context ===
 +
 
 +
<pre>SMB2_POSIX_EXTENSIONS 0x100</pre>
 +
 
 +
Actual length/fields not decided yet, use the context data length field.
 +
 
 +
=== Create Context ===
 +
 
 +
==== For client requests ====
 +
 
 +
<pre>
 +
context length = 4
 +
 
 +
#define SMB2_CREATE_TAG_POSIX "\x93\xAD\x25\x50\x9C\xB4\x11\xE7\xB4\x23\x83\xDE\x96\x8B\xCD\x7C"
 +
 
 +
blob[0] = le32 unix_perms_to_wire(mode & ~S_IFMT)
 +
 
 +
#define UNIX_X_OTH 0000001
 +
#define UNIX_W_OTH 0000002
 +
#define UNIX_R_OTH 0000004
 +
#define UNIX_X_GRP 0000010
 +
#define UNIX_W_GRP                      0000020
 +
#define UNIX_R_GRP                      0000040
 +
#define UNIX_X_USR                      0000100
 +
#define UNIX_W_USR                      0000200
 +
#define UNIX_R_USR                      0000400
 +
#define UNIX_STICKY                    0001000
 +
#define UNIX_SET_GID                    0002000
 +
#define UNIX_SET_UID                    0004000
 +
</pre>
 +
 
 +
==== For responses ====
 +
 
 +
<pre>
 +
context length = 12 + 2*(8 + 4*sid->num_auths);
 +
              = 12 (bug?)
 +
 
 +
</pre>
 +
 
 +
==== Info level ====
 +
 
 +
New info level
 +
 
 +
<pre>
 +
#define SMB2_FIND_POSIX_INFORMATION 0x64
 +
 
 +
via GETINFO or QUERY_DIR
 +
 
 +
context length = 68+12
 +
 
 +
data  content
 +
#----part1-----
 +
leu64 put_long_date_timespec(SMB_STRUCT_STAT->st_ex_btime) // birth
 +
leu64 put_long_date_timespec(SMB_STRUCT_STAT->st_ex_atime) // access
 +
leu64 put_long_date_timespec(SMB_STRUCT_STAT->st_ex_mtime) // last write
 +
leu64 put_long_date_timespec(SMB_STRUCT_STAT->st_ex_ctime) // change
 +
leu64 # bytes used on disk
 +
leu64 file size
 +
leu32 dos attributes
 +
leu64 inode
 +
leu64 device (major?)
 +
leu64 zero
 +
#----part2-----
 +
le32  SMB_STRUCT_STAT->st_ex_nlink // number of hardlinks
 +
le32  FILE_FLAG_REPARSE            // symlinks?
 +
le32  unix_perms_to_wire(SMB_STRUCT_STAT->st_ex_mode & ~S_IFMT)
 +
#----part missing..----
 +
sid  sid_owner
 +
sid  sid_group
 +
 
 +
// sid size = 8 + 4*sid->num_auths;
 +
 
 +
BUG?
 +
DATA_BLOB smb2_posix_cc_info(TALLOC_CTX *mem_ctx,
 +
size_t b_size = 12;
 +
....
 +
/* Now add in the owner and group sids. */
 +
sid_linearize(ret_blob.data + 12,
 +
b_size - 12,
 +
&sid_owner);
 +
sid_linearize(ret_blob.data + 12 + owner_sid_size,
 +
b_size - owner_sid_size - 12,
 +
&sid_group);
 +
 
 +
no-ops because b_size is always 12, so len = 0, then bug because b_size - 12 - X => wraps around
 +
</pre>
 +
 
 +
=== POSIX extensions codepaths ===
 +
 
 +
<pre>
 +
SMB2_OP_QUERY_DIRECTORY:
 +
smbd_smb2_request_process_query_directory
 +
smbd_smb2_query_directory_send
 +
smbd_dirptr_lanman2_entry
 +
smbd_marshall_dir_entry
 +
  store_smb2_posix_info <--- sends #1+#2
 +
    smb2_posix_cc_info
 +
</pre>
 +
 
 +
<pre>
 +
SMB2_OP_GETINFO:
 +
smbd_smb2_getinfo_send
 +
smbd_do_qfilepathinfo
 +
  store_smb2_posix_info <--- sends #1+#2
 +
    smb2_posix_cc_info
 +
</pre>
 +
 
 +
<pre>
 +
SMB2_OP_CREATE:
 +
smbd_smb2_create_send
 +
smbd_smb2_create_after_exec
 +
    smb2_posix_cc_info  <--- sends #2 in POSIX create context
 +
</pre>

Revision as of 18:55, 24 May 2018

There are various requirements for full POSIX compatibility, and other requirements which although not strictly POSIX (such as support for symlinks and the fallocate system call) are common in Linux and various Unix variants and useful to applications. The goal is to implement emulation strategies and extensions to the SMB3 protocol which are as small as reasonably possible but implement the most important of these missing features, allowing the network file system to appear nearly identical to a local file system to users and the applications they run, without creating unacceptable performance or configuration problems.

The general requirements for SMB3 POSIX extensions include the following:

  1. POSIX mode bits (the primitive 0777 bits used to control who can access a file)
  2. POSIX file ownership (UID and GID owners. Windows typically only has one or the other, and expresses them as global "SIDs" with longer UUIDs rather than locally defined UIDs)
  3. symlinks
  4. case sensitivity
  5. mapping 7 reserved characters (not allowed in SMB3/CIFS/NTFS/Windows but allowed in POSIX). They include: * ? < > : | \
  6. mkfifo and mknod
  7. POSIX unlink and rename behavior:
    1. unlink: deleting an open file, removing it from the namespace, occurs in POSIX but not Windows
    2. rename: renaming a directories that has open files, perfectly legal in POSIX but not in Windows (even recursivley)
  8. POSIX "advisory" byte range locks (SMB3 allows Windows style "mandatory" byte range locks). POSIX locks are also merged when they overlap, and all locks are released on file close making them both confusing to use (locally on Linux file systems, and even more so over network file systems) and more difficult to emulate. Although many dislike the POSIX byte range lock behavior, their implementation in SMB3 would help some applications.
  9. Slight differences in "stat" system call (and the mode/ownership information noted above)
  10. Additional information returned on the statfs" system call: f_files; /* total file nodes in file system */ and f_ffree; /* free file nodes in fs */
  11. "POSIX ACL" support. Linux implements an ACL model for local file systems which is less complex than the more common "RichACLs" (ie NFSv4 or NTFS/SMB/SMB3 ACLs) but easier to understand.
  12. fallocate: many fallocate options are available, most but not all are mappable to various existing SMB3 ioctls.


Current status:

  1. POSIX mode bits: emulatable via the "cifsacl" (cifs.ko mount option for cifs which pulls them from the server's "RichACL" (NTFS/SMB3/NFSv4). Using an approach similar to the "NFSv4 mode ACE" may be helpful as well. Prototype not complete. SMB3_SetACL and SMB3_GetACL worker functions for Linux's cifs.ko have been prototyped but not reviewed yet.

smbd status: can be set on create via a create context (POSIX extension).

  1. POSIX file ownership (see above).

smbd status: can be set on create via a create context (POSIX extension).

  1. symlinks: use the "mfsymlinks" approach used by Apple among others. Implemented in cifs.ko. Will be in kernel 3.18 and later. Should be backportable to earlier kernels.

smbd status: looks like this hasn't changed.

  1. case sensitivity: Not available yet, requires extension to SMB3 OpenCreate call - a new "POSIX Create Context" has been proposed.

smbd status: looks like this hasn't changed.

  1. mapping 7 reserved characters: There are three ways to do this: "POSIX Create Context" and Microsoft's "SFU" (SUA) mapping and Apple's "SFM" mapping. The SFU mapping is available in CIFS (and SMB3 in 3.18) with the "mapchars" mount option but we plan to use the Apple ("SFM") mapping approach by default in 3.18 kernel and later (Samba requires the "vfs_fruit" module to implement the Apple mapping of the seven reserved characters).

smbd status: looks like this hasn't changed.

  1. mkfifo and mknod: are emulated using the same approach that Microsoft SFU and others did. Uses the "sfu" mount option (available in 3.18 kernel or later).

smbd status: device nodes can be retrieved via POSIX info level (POSIX Extension).

  1. POSIX unlink and rename behavior. Emulatable over SMB3 for most cases (using "delete on close" and using an approach like "nfs silly rename"). 3.18 kernel will better handle these but "POSIX Create Context" are still likely to be required for a few use cases.

smbd status: looks like this hasn't changed.

  1. POSIX Advisory byte range locks: emulated via mandatory locks today, and can also be "local only" (with a cifs.ko mount option "nobrl"). Requires "POSIX Create Context"
  2. stat (see above)
  3. statfs: For the two fields which are not retrievable other ways (minor issue). "POSIX Create Context" can be used.

smbd status: looks like this done.

  1. POSIX ACLs: Could be mapped to SMB3/NTFS RichACLs which are a superset of POSIX ACLs. Also could be handled via "POSIX Create Context".

smbd status: looks like this done.

  1. fallocate (partially implemented already) and also a few other new Linux syscalls which are not broadly implemented: more research needed.

smbd status: looks like this hasn't changed.

samba POSIX extension status

As of 24-05-2018 from JRA's master-smb2 branch:

Negotiate Context

SMB2_POSIX_EXTENSIONS 0x100

Actual length/fields not decided yet, use the context data length field.

Create Context

For client requests

context length = 4

#define SMB2_CREATE_TAG_POSIX "\x93\xAD\x25\x50\x9C\xB4\x11\xE7\xB4\x23\x83\xDE\x96\x8B\xCD\x7C"

blob[0] = le32 unix_perms_to_wire(mode & ~S_IFMT)

#define UNIX_X_OTH			0000001
#define UNIX_W_OTH			0000002
#define UNIX_R_OTH			0000004
#define UNIX_X_GRP			0000010
#define UNIX_W_GRP                      0000020
#define UNIX_R_GRP                      0000040
#define UNIX_X_USR                      0000100
#define UNIX_W_USR                      0000200
#define UNIX_R_USR                      0000400
#define UNIX_STICKY                     0001000
#define UNIX_SET_GID                    0002000
#define UNIX_SET_UID                    0004000

For responses

context length = 12 + 2*(8 + 4*sid->num_auths);
               = 12 (bug?)

Info level

New info level

#define SMB2_FIND_POSIX_INFORMATION	0x64

via GETINFO or QUERY_DIR

context length = 68+12

data  content
#----part1-----
leu64 put_long_date_timespec(SMB_STRUCT_STAT->st_ex_btime) // birth
leu64 put_long_date_timespec(SMB_STRUCT_STAT->st_ex_atime) // access
leu64 put_long_date_timespec(SMB_STRUCT_STAT->st_ex_mtime) // last write
leu64 put_long_date_timespec(SMB_STRUCT_STAT->st_ex_ctime) // change
leu64 # bytes used on disk
leu64 file size
leu32 dos attributes
leu64 inode
leu64 device (major?)
leu64 zero
#----part2-----
le32  SMB_STRUCT_STAT->st_ex_nlink // number of hardlinks
le32  FILE_FLAG_REPARSE            // symlinks?
le32  unix_perms_to_wire(SMB_STRUCT_STAT->st_ex_mode & ~S_IFMT)
#----part missing..----
sid   sid_owner
sid   sid_group

// sid size = 8 + 4*sid->num_auths;

BUG?
DATA_BLOB smb2_posix_cc_info(TALLOC_CTX *mem_ctx,
	size_t b_size = 12;
	....
	/* Now add in the owner and group sids. */
	sid_linearize(ret_blob.data + 12,
			b_size - 12,
			&sid_owner);
	sid_linearize(ret_blob.data + 12 + owner_sid_size,
			b_size - owner_sid_size - 12,
			&sid_group);

no-ops because b_size is always 12, so len = 0, then bug because b_size - 12 - X => wraps around

POSIX extensions codepaths

SMB2_OP_QUERY_DIRECTORY:
 smbd_smb2_request_process_query_directory
 smbd_smb2_query_directory_send
 smbd_dirptr_lanman2_entry
 smbd_marshall_dir_entry
  store_smb2_posix_info <--- sends #1+#2
    smb2_posix_cc_info
SMB2_OP_GETINFO:
smbd_smb2_getinfo_send
smbd_do_qfilepathinfo
  store_smb2_posix_info <--- sends #1+#2
    smb2_posix_cc_info
SMB2_OP_CREATE:
smbd_smb2_create_send
smbd_smb2_create_after_exec
    smb2_posix_cc_info  <--- sends #2 in POSIX create context