SMB3-Linux: Difference between revisions
(add smb3 posix extension status) |
(→Specification: Update the URL) |
||
(35 intermediate revisions by 4 users not shown) | |||
Line 1: | Line 1: | ||
[[Category:CIFSclient]] |
|||
There are various requirements for full POSIX compatibility, and other requirements which although not strictly POSIX (such as support for symlinks and the fallocate system call) are common in Linux and various Unix variants and useful to applications. The goal is to implement emulation strategies and extensions to the SMB3 protocol which are as small as reasonably possible but implement the most important of these missing features, allowing the network file system to appear nearly identical to a local file system to users and the applications they run, without creating unacceptable performance or configuration problems. |
|||
There are various requirements for full POSIX compatibility, and other |
|||
requirements which although not strictly POSIX (such as support for |
|||
symlinks and the fallocate system call) are common in Linux and |
|||
various Unix variants and useful to applications. The goal is to |
|||
implement emulation strategies and extensions to the SMB3 protocol |
|||
which are as small as reasonably possible but implement the most |
|||
important of these missing features, allowing the network file system |
|||
to appear nearly identical to a local file system to users and the |
|||
applications they run, without creating unacceptable performance or |
|||
configuration problems. |
|||
= Requirements = |
|||
In this document POSIX CC stands for ''POSIX Create Context'' which is a chunk of data that can be optionally included in a Create request/response. |
|||
The general requirements for SMB3 POSIX extensions include the following: |
The general requirements for SMB3 POSIX extensions include the following: |
||
#POSIX mode bits (the primitive 0777 bits used to control who can access a file) |
|||
#POSIX file ownership (UID and GID owners. Windows typically only has one or the other, and expresses them as global "SIDs" with longer UUIDs rather than locally defined UIDs) |
|||
#symlinks |
|||
#case sensitivity |
|||
#mapping 7 reserved characters (not allowed in SMB3/CIFS/NTFS/Windows but allowed in POSIX). They include: * ? < > : | \ |
|||
#mkfifo and mknod |
|||
#POSIX unlink and rename behavior: |
|||
## unlink: deleting an open file, removing it from the namespace, occurs in POSIX but not Windows |
|||
## rename: renaming a directories that has open files, perfectly legal in POSIX but not in Windows (even recursivley) |
|||
#POSIX "advisory" byte range locks (SMB3 allows Windows style "mandatory" byte range locks). POSIX locks are also merged when they overlap, and all locks are released on file close making them both confusing to use (locally on Linux file systems, and even more so over network file systems) and more difficult to emulate. Although many dislike the POSIX byte range lock behavior, their implementation in SMB3 would help some applications. |
|||
#Slight differences in "stat" system call (and the mode/ownership information noted above) |
|||
#Additional information returned on the statfs" system call: f_files; /* total file nodes in file system */ and f_ffree; /* free file nodes in fs */ |
|||
#"POSIX ACL" support. Linux implements an ACL model for local file systems which is less complex than the more common "RichACLs" (ie NFSv4 or NTFS/SMB/SMB3 ACLs) but easier to understand. |
|||
#fallocate: many fallocate options are available, most but not all are mappable to various existing SMB3 ioctls. |
|||
<br /> |
|||
Current status: |
|||
#POSIX mode bits: emulatable via the "cifsacl" (cifs.ko mount option for cifs which pulls them from the server's "RichACL" (NTFS/SMB3/NFSv4). Using an approach similar to the "NFSv4 mode ACE" may be helpful as well. Prototype not complete. SMB3_SetACL and SMB3_GetACL worker functions for Linux's cifs.ko have been prototyped but not reviewed yet. |
|||
'''smbd status''': can be set on create via a create context (POSIX extension). |
|||
#POSIX file ownership (see above). |
|||
'''smbd status''': can be set on create via a create context (POSIX extension). |
|||
#symlinks: use the "mfsymlinks" approach used by Apple among others. Implemented in cifs.ko. Will be in kernel 3.18 and later. Should be backportable to earlier kernels. |
|||
'''smbd status''': looks like this hasn't changed. |
|||
#case sensitivity: Not available yet, requires extension to SMB3 OpenCreate call - a new "POSIX Create Context" has been proposed. |
|||
'''smbd status''': looks like this hasn't changed. |
|||
#mapping 7 reserved characters: There are three ways to do this: "POSIX Create Context" and Microsoft's "SFU" (SUA) mapping and Apple's "SFM" mapping. The SFU mapping is available in CIFS (and SMB3 in 3.18) with the "mapchars" mount option but we plan to use the Apple ("SFM") mapping approach by default in 3.18 kernel and later (Samba requires the "vfs_fruit" module to implement the Apple mapping of the seven reserved characters). |
|||
'''smbd status''': looks like this hasn't changed. |
|||
#mkfifo and mknod: are emulated using the same approach that Microsoft SFU and others did. Uses the "sfu" mount option (available in 3.18 kernel or later). |
|||
'''smbd status''': device nodes can be retrieved via POSIX info level (POSIX Extension). |
|||
#POSIX unlink and rename behavior. Emulatable over SMB3 for most cases (using "delete on close" and using an approach like "nfs silly rename"). 3.18 kernel will better handle these but "POSIX Create Context" are still likely to be required for a few use cases. |
|||
'''smbd status''': looks like this hasn't changed. |
|||
#POSIX Advisory byte range locks: emulated via mandatory locks today, and can also be "local only" (with a cifs.ko mount option "nobrl"). Requires "POSIX Create Context" |
|||
#stat (see above) |
|||
#statfs: For the two fields which are not retrievable other ways (minor issue). "POSIX Create Context" can be used. |
|||
'''smbd status''': looks like this done. |
|||
#POSIX ACLs: Could be mapped to SMB3/NTFS RichACLs which are a superset of POSIX ACLs. Also could be handled via "POSIX Create Context". |
|||
'''smbd status''': looks like this done. |
|||
#fallocate (partially implemented already) and also a few other new Linux syscalls which are not broadly implemented: more research needed. |
|||
'''smbd status''': looks like this hasn't changed. |
|||
== samba POSIX extension status == |
|||
== POSIX mode bits == |
|||
As of 24-05-2018 from JRA's master-smb2 branch: |
|||
The primitive 07777 bits used to control who can access a file or directory. (RWX bits for user, group, other + sticky,setuid,setgid bits) |
|||
==== status ==== |
|||
Multiple ways to implement it: |
|||
* Emulatable via ACLs. cifs.ko can try its best to map the mode bits to Windows ACLs. This is implemented via cifsacl mount option. |
|||
* Windows NFS server stores mode bits as ''special'' ACL. This is not the same as emulating them. It stores them in ACL entries with a SID that is "invalid" and in which the last sub-auth has POSIX informations. There is one entry with the UID, one with GID, and one with mode bits. |
|||
* SMB2 POSIX extensions adds a Create Context that the client can use to pass mode bits. |
|||
Notes: |
|||
* '''mkdir setuid/setgid''': In Linux, mkdir() strips setuid and setgid bits (not a bug). |
|||
* '''mkdir user read/execute''': Samba returns ''access denied'' on mkdir of a directory which doesn't have the read and execute for the owner, regardless of whether if the directory was successfully created. It needs u=rx to succeed. It needs to be workaround in cifs.ko (TODO: try mkdir + setinfo?) |
|||
== POSIX file ownership == |
|||
UID and GID owners. Windows typically only has one or the other, and |
|||
expresses them as global "SIDs" with longer UUIDs rather than locally |
|||
defined UIDs. |
|||
==== status ==== |
|||
See POSIX mode bits status. |
|||
== Symbolic links == |
|||
Windows now has the concept of reparse points. Reparse points are used to implement symlinks on Windows. |
|||
==== status ==== |
|||
* write symlinks as plaintext file with special header and content. Implemented in cifs.ko with mfsymlink mount opt. "mfsymlinks" approach used by Apple among others. Will be in kernel 3.18 and later. Should be backportable |
|||
to earlier kernels. |
|||
* re-use Windows server for NFS way of storing unix symlinks, i.e. reparse points (note that reparse point tag is different than regular Windows symlinks) |
|||
== Case sensitivity == |
|||
==== status ==== |
|||
Files opened with the POSIX Create Context get POSIX semantics, including case sensitivity. |
|||
== No reserved path characters == |
|||
Mapping 7 reserved characters (not allowed in SMB3/CIFS/NTFS/Windows but allowed in POSIX). |
|||
They include: * ? < > : | \ |
|||
==== status ==== |
|||
There are 2 ways to do this: |
|||
* Send the path unmodified with a POSIX CC |
|||
* Map the reserved characters to an unreserved but "invalid" unicode range. 2 mappings already exist: |
|||
** Microsoft's "SFU" (SUA) mapping |
|||
** Apple's "SFM" mapping. |
|||
The SFU mapping is available in CIFS (and SMB3 in 3.18) with the "mapchars" |
|||
mount option but we plan to use the Apple ("SFM") mapping approach by |
|||
default in 3.18 kernel and later (Samba requires the "vfs_fruit" |
|||
module to implement the Apple mapping of the seven reserved |
|||
characters). |
|||
== mkfifo and mknod == |
|||
==== status ==== |
|||
These are emulated using the same approach that Microsoft SFU and others |
|||
did. Uses the "sfu" mount option (available in 3.18 kernel or later). |
|||
== POSIX unlink and rename behavior == |
|||
* unlink: deleting an open file, removing it from the namespace, occurs in POSIX but not Windows |
|||
* rename: renaming a directories that has open files, perfectly legal in POSIX but not in Windows (even recursivley) |
|||
==== status ==== |
|||
Emulatable over SMB3 for most cases (using "delete on |
|||
close" and using an approach like "nfs silly rename"). 3.18 kernel |
|||
will better handle these but "POSIX Create Context" are still likely |
|||
to be required for a few use cases. |
|||
== POSIX byte range locks == |
|||
POSIX "advisory" byte range locks (SMB3 allows Windows style |
|||
"mandatory" byte range locks). POSIX locks are also merged when they |
|||
overlap, and all locks are released on file close making them both |
|||
confusing to use (locally on Linux file systems, and even more so over |
|||
network file systems) and more difficult to emulate. Although many |
|||
dislike the POSIX byte range lock behavior, their implementation in |
|||
SMB3 would help some applications. |
|||
==== status ==== |
|||
POSIX CC will enable POSIX flavor of locks on the handle. |
|||
Emulated via mandatory locks today, and can also be "local only" (with |
|||
a cifs.ko mount option "nobrl"). |
|||
== flock == |
|||
In POSIX, flock(2) are file lock applied to an open file descriptor. |
|||
They apply on the whole file but they are advisory. Applications are free |
|||
to ignore them and read/write on the fd. Whereas SMB locks will prevent read/writes. |
|||
== More information returned in stat() syscall == |
|||
*Slight differences in "stat" system call (and the mode/ownership information noted above) |
|||
*Additional information returned on the statfs" system call: |
|||
** <code>f_files; /* total file nodes in file system */</code> |
|||
** <code>f_ffree; /* free file nodes in fs */</code> |
|||
== status == |
|||
*stat: Use POSIX information level to get additional stat fields in QUERY INFO and FIND requests. |
|||
*statfs: '''fields still missing''' |
|||
== POSIX ACL support == |
|||
Linux implements an ACL model for local file systems which is less |
|||
complex than the more common "RichACLs" (ie NFSv4 or NTFS/SMB/SMB3 |
|||
ACLs) but easier to understand. |
|||
==== status ==== |
|||
Could be mapped to SMB3/NTFS RichACLs which are a superset of POSIX |
|||
ACLs. Also could be handled via "POSIX Create Context". |
|||
== fallocate() parameters == |
|||
Many fallocate options are available, most but not all are mappable to various existing SMB3 ioctls. |
|||
TODO: examples |
|||
==== status ==== |
|||
Partially implemented already, and also a few other new Linux syscalls |
|||
which are not broadly implemented: more research needed. |
|||
= Code & tests = |
|||
* Wireshark: git repo at https://github.com/aaptel/wireshark.git ('''smb3unix branch''') |
|||
* Samba: git repo at git://git.samba.org/jra/samba/.git ('''master-smb2 branch''') |
|||
* Samba rebased against 4.17: git repo at https://gitlab.com/samba-team/devel/samba.git ('''dmulder/master-smb2 branch''') |
|||
* Linux kernel: latest POSIX code at git://git.samba.org/sfrench/cifs-2.6.git ('''for-next branch''') |
|||
* Test client code in Pike (python): https://github.com/aaptel/pike.git ('''smb3unix branch''') |
|||
Sample smb.conf for samba (see [https://github.com/aaptel/pike/tree/smb3unix#running-posix-tests pike README]): |
|||
<pre>[global] |
|||
server max protocol = SMB3_11 |
|||
smb3 unix extensions = yes |
|||
[share] |
|||
create mask = 07777 |
|||
directory mask = 07777 |
|||
mangled names = no |
|||
path = /tmp/share |
|||
read only = no |
|||
guest ok = yes</pre> |
|||
Linux kernel mount options: |
|||
<pre>mount –t smb3 //<address>/<share> /mnt -o username=<user>,password=<pass>,vers=3.1.1,posix,mfsymlinks,nomapposix,noperm</pre> |
|||
= POSIX extension wire protocol status = |
|||
As of 2018-12-13 from JRA's master-smb2 branch. (commit 1db5d5d4254 "s3: smbd: smb2-posix: Return STOPPED_ON_SYMLINK when hitting reparse point partway in a path.") |
|||
Note that all integers are in Little-Endian. |
|||
== Specification == |
|||
You can find a work-in-progress specification document here: https://codeberg.org/SMB3UNIX/smb3_posix_spec |
|||
== Negotiate Context == |
|||
<pre>SMB2_POSIX_EXTENSIONS 0x100</pre> |
<pre>SMB2_POSIX_EXTENSIONS 0x100</pre> |
||
The data field in the negotiate context MUST be 16 bytes, and contain the following bytes: |
|||
Actual length/fields not decided yet, use the context data length field. |
|||
<code>\x93\xAD\x25\x50\x9C\xB4\x11\xE7\xB4\x23\x83\xDE\x96\x8B\xCD\x7C</code> |
|||
=== Create Context === |
|||
These bytes uniquely identify the SMB3 Posix version. |
|||
==== For client requests ==== |
|||
== Create Context == |
|||
<pre> |
|||
context length = 4 |
|||
=== For client requests === |
|||
#define SMB2_CREATE_TAG_POSIX "\x93\xAD\x25\x50\x9C\xB4\x11\xE7\xB4\x23\x83\xDE\x96\x8B\xCD\x7C" |
|||
New create context. If a file is open with this context, the handle gets POSIX_SEMANTICS flag set. |
|||
blob[0] = le32 unix_perms_to_wire(mode & ~S_IFMT) |
|||
*Context tag: <code>SMB2_CREATE_TAG_POSIX "\x93\xAD\x25\x50\x9C\xB4\x11\xE7\xB4\x23\x83\xDE\x96\x8B\xCD\x7C"</code> |
|||
*Context payload size: 4 bytes |
|||
Unix perm mode to be used for the new file/dir. |
|||
The bits used are as follow (note the values are in octal): |
|||
<pre> |
|||
#define UNIX_X_OTH 0000001 |
#define UNIX_X_OTH 0000001 |
||
#define UNIX_W_OTH 0000002 |
#define UNIX_W_OTH 0000002 |
||
Line 76: | Line 219: | ||
</pre> |
</pre> |
||
=== For server responses === |
|||
The server can respond to CREATE request with this POSIX context too (same context tag). |
|||
<pre> |
|||
context length = 12 + 2*(8 + 4*sid->num_auths); |
|||
= 12 (bug?) |
|||
* Context payload size can vary because of the SID, but the maximum should be 12 + 2*28 = 68 bytes. |
|||
<pre> |
|||
u32 SMB_STRUCT_STAT->st_ex_nlink // number of hardlinks |
|||
u32 FILE_FLAG_REPARSE // "reparse_tag", 0 for regular files, will be used for FIFO, symlinks, etc... |
|||
u32 unix_perms_to_wire(SMB_STRUCT_STAT->st_ex_mode & ~S_IFMT) |
|||
sid sid_owner |
|||
sid sid_group |
|||
</pre> |
</pre> |
||
==== Info level ==== |
|||
A sid is encoded as follow. Size can go up to 28 bytes: |
|||
New info level |
|||
<pre> |
<pre> |
||
u8 sid_rev_num |
|||
#define SMB2_FIND_POSIX_INFORMATION 0x64 |
|||
u8 num_auths (range 0-5) |
|||
buf id_auth (6 bytes) |
|||
[u32 sub_auth] * num_auths (variable length) |
|||
</pre> |
|||
== Info level == |
|||
via GETINFO or QUERY_DIR |
|||
New info level requestable via GETINFO or FIND. The payload contains a POSIX Create Context response at the end. |
|||
context length = 68+12 |
|||
* Level value: <code>SMB2_FIND_POSIX_INFORMATION 0x64</code> |
|||
data content |
|||
* Payload length: 136. |
|||
#----part1----- |
|||
** 68 + POSIXCreateContextResponse (see above) |
|||
leu64 put_long_date_timespec(SMB_STRUCT_STAT->st_ex_btime) // birth |
|||
leu64 put_long_date_timespec(SMB_STRUCT_STAT->st_ex_atime) // access |
|||
leu64 put_long_date_timespec(SMB_STRUCT_STAT->st_ex_mtime) // last write |
|||
leu64 put_long_date_timespec(SMB_STRUCT_STAT->st_ex_ctime) // change |
|||
leu64 # bytes used on disk |
|||
leu64 file size |
|||
leu32 dos attributes |
|||
leu64 inode |
|||
leu64 device (major?) |
|||
leu64 zero |
|||
#----part2----- |
|||
le32 SMB_STRUCT_STAT->st_ex_nlink // number of hardlinks |
|||
le32 FILE_FLAG_REPARSE // symlinks? |
|||
le32 unix_perms_to_wire(SMB_STRUCT_STAT->st_ex_mode & ~S_IFMT) |
|||
#----part missing..---- |
|||
sid sid_owner |
|||
sid sid_group |
|||
<pre> |
|||
// sid size = 8 + 4*sid->num_auths; |
|||
u64 put_long_date_timespec(SMB_STRUCT_STAT->st_ex_btime) // birth |
|||
u64 put_long_date_timespec(SMB_STRUCT_STAT->st_ex_atime) // access |
|||
u64 put_long_date_timespec(SMB_STRUCT_STAT->st_ex_mtime) // last write |
|||
u64 put_long_date_timespec(SMB_STRUCT_STAT->st_ex_ctime) // change |
|||
u64 # bytes used on disk |
|||
u64 file size |
|||
u32 dos attributes |
|||
u64 inode |
|||
u32 SMB_STRUCT_STAT->st_ex_dev // device ID |
|||
u32 zero |
|||
POSIXCreateContextResponse (size=68 bytes) |
|||
</pre> |
|||
For FIND (directory listing) there is some extra data at the start (offset to the next directory entry) and the file name at the end: |
|||
BUG? |
|||
DATA_BLOB smb2_posix_cc_info(TALLOC_CTX *mem_ctx, |
|||
size_t b_size = 12; |
|||
.... |
|||
/* Now add in the owner and group sids. */ |
|||
sid_linearize(ret_blob.data + 12, |
|||
b_size - 12, |
|||
&sid_owner); |
|||
sid_linearize(ret_blob.data + 12 + owner_sid_size, |
|||
b_size - owner_sid_size - 12, |
|||
&sid_group); |
|||
<pre> |
|||
no-ops because b_size is always 12, so len = 0, then bug because b_size - 12 - X => wraps around |
|||
u32 next_offset |
|||
u32 ignored |
|||
POSIXInformation |
|||
u32 file_name_byte_count |
|||
utf16 file_name (NOT UTF8!) |
|||
</pre> |
</pre> |
||
== POSIX extensions codepaths in samba == |
|||
<pre> |
<pre> |
||
Line 140: | Line 283: | ||
smbd_dirptr_lanman2_entry |
smbd_dirptr_lanman2_entry |
||
smbd_marshall_dir_entry |
smbd_marshall_dir_entry |
||
store_smb2_posix_info <--- sends |
store_smb2_posix_info <--- sends next_offset + info + posix cc rsp + filename (length + utf16) |
||
smb2_posix_cc_info |
smb2_posix_cc_info |
||
</pre> |
</pre> |
||
Line 148: | Line 291: | ||
smbd_smb2_getinfo_send |
smbd_smb2_getinfo_send |
||
smbd_do_qfilepathinfo |
smbd_do_qfilepathinfo |
||
store_smb2_posix_info <--- sends |
store_smb2_posix_info <--- sends info + posix cc rsp |
||
smb2_posix_cc_info |
smb2_posix_cc_info |
||
</pre> |
</pre> |
||
Line 156: | Line 299: | ||
smbd_smb2_create_send |
smbd_smb2_create_send |
||
smbd_smb2_create_after_exec |
smbd_smb2_create_after_exec |
||
smb2_posix_cc_info <--- sends |
smb2_posix_cc_info <--- sends POSIX create context resp |
||
</pre> |
</pre> |
Latest revision as of 21:13, 28 September 2022
There are various requirements for full POSIX compatibility, and other
requirements which although not strictly POSIX (such as support for
symlinks and the fallocate system call) are common in Linux and
various Unix variants and useful to applications. The goal is to
implement emulation strategies and extensions to the SMB3 protocol
which are as small as reasonably possible but implement the most
important of these missing features, allowing the network file system
to appear nearly identical to a local file system to users and the
applications they run, without creating unacceptable performance or
configuration problems.
Requirements
In this document POSIX CC stands for POSIX Create Context which is a chunk of data that can be optionally included in a Create request/response.
The general requirements for SMB3 POSIX extensions include the following:
POSIX mode bits
The primitive 07777 bits used to control who can access a file or directory. (RWX bits for user, group, other + sticky,setuid,setgid bits)
status
Multiple ways to implement it:
- Emulatable via ACLs. cifs.ko can try its best to map the mode bits to Windows ACLs. This is implemented via cifsacl mount option.
- Windows NFS server stores mode bits as special ACL. This is not the same as emulating them. It stores them in ACL entries with a SID that is "invalid" and in which the last sub-auth has POSIX informations. There is one entry with the UID, one with GID, and one with mode bits.
- SMB2 POSIX extensions adds a Create Context that the client can use to pass mode bits.
Notes:
- mkdir setuid/setgid: In Linux, mkdir() strips setuid and setgid bits (not a bug).
- mkdir user read/execute: Samba returns access denied on mkdir of a directory which doesn't have the read and execute for the owner, regardless of whether if the directory was successfully created. It needs u=rx to succeed. It needs to be workaround in cifs.ko (TODO: try mkdir + setinfo?)
POSIX file ownership
UID and GID owners. Windows typically only has one or the other, and expresses them as global "SIDs" with longer UUIDs rather than locally defined UIDs.
status
See POSIX mode bits status.
Symbolic links
Windows now has the concept of reparse points. Reparse points are used to implement symlinks on Windows.
status
- write symlinks as plaintext file with special header and content. Implemented in cifs.ko with mfsymlink mount opt. "mfsymlinks" approach used by Apple among others. Will be in kernel 3.18 and later. Should be backportable
to earlier kernels.
- re-use Windows server for NFS way of storing unix symlinks, i.e. reparse points (note that reparse point tag is different than regular Windows symlinks)
Case sensitivity
status
Files opened with the POSIX Create Context get POSIX semantics, including case sensitivity.
No reserved path characters
Mapping 7 reserved characters (not allowed in SMB3/CIFS/NTFS/Windows but allowed in POSIX). They include: * ? < > : | \
status
There are 2 ways to do this:
- Send the path unmodified with a POSIX CC
- Map the reserved characters to an unreserved but "invalid" unicode range. 2 mappings already exist:
- Microsoft's "SFU" (SUA) mapping
- Apple's "SFM" mapping.
The SFU mapping is available in CIFS (and SMB3 in 3.18) with the "mapchars" mount option but we plan to use the Apple ("SFM") mapping approach by default in 3.18 kernel and later (Samba requires the "vfs_fruit" module to implement the Apple mapping of the seven reserved characters).
mkfifo and mknod
status
These are emulated using the same approach that Microsoft SFU and others did. Uses the "sfu" mount option (available in 3.18 kernel or later).
POSIX unlink and rename behavior
- unlink: deleting an open file, removing it from the namespace, occurs in POSIX but not Windows
- rename: renaming a directories that has open files, perfectly legal in POSIX but not in Windows (even recursivley)
status
Emulatable over SMB3 for most cases (using "delete on close" and using an approach like "nfs silly rename"). 3.18 kernel will better handle these but "POSIX Create Context" are still likely to be required for a few use cases.
POSIX byte range locks
POSIX "advisory" byte range locks (SMB3 allows Windows style "mandatory" byte range locks). POSIX locks are also merged when they overlap, and all locks are released on file close making them both confusing to use (locally on Linux file systems, and even more so over network file systems) and more difficult to emulate. Although many dislike the POSIX byte range lock behavior, their implementation in SMB3 would help some applications.
status
POSIX CC will enable POSIX flavor of locks on the handle.
Emulated via mandatory locks today, and can also be "local only" (with a cifs.ko mount option "nobrl").
flock
In POSIX, flock(2) are file lock applied to an open file descriptor. They apply on the whole file but they are advisory. Applications are free to ignore them and read/write on the fd. Whereas SMB locks will prevent read/writes.
More information returned in stat() syscall
- Slight differences in "stat" system call (and the mode/ownership information noted above)
- Additional information returned on the statfs" system call:
f_files; /* total file nodes in file system */
f_ffree; /* free file nodes in fs */
status
- stat: Use POSIX information level to get additional stat fields in QUERY INFO and FIND requests.
- statfs: fields still missing
POSIX ACL support
Linux implements an ACL model for local file systems which is less complex than the more common "RichACLs" (ie NFSv4 or NTFS/SMB/SMB3 ACLs) but easier to understand.
status
Could be mapped to SMB3/NTFS RichACLs which are a superset of POSIX ACLs. Also could be handled via "POSIX Create Context".
fallocate() parameters
Many fallocate options are available, most but not all are mappable to various existing SMB3 ioctls.
TODO: examples
status
Partially implemented already, and also a few other new Linux syscalls which are not broadly implemented: more research needed.
Code & tests
- Wireshark: git repo at https://github.com/aaptel/wireshark.git (smb3unix branch)
- Samba: git repo at git://git.samba.org/jra/samba/.git (master-smb2 branch)
- Samba rebased against 4.17: git repo at https://gitlab.com/samba-team/devel/samba.git (dmulder/master-smb2 branch)
- Linux kernel: latest POSIX code at git://git.samba.org/sfrench/cifs-2.6.git (for-next branch)
- Test client code in Pike (python): https://github.com/aaptel/pike.git (smb3unix branch)
Sample smb.conf for samba (see pike README):
[global] server max protocol = SMB3_11 smb3 unix extensions = yes [share] create mask = 07777 directory mask = 07777 mangled names = no path = /tmp/share read only = no guest ok = yes
Linux kernel mount options:
mount –t smb3 //<address>/<share> /mnt -o username=<user>,password=<pass>,vers=3.1.1,posix,mfsymlinks,nomapposix,noperm
POSIX extension wire protocol status
As of 2018-12-13 from JRA's master-smb2 branch. (commit 1db5d5d4254 "s3: smbd: smb2-posix: Return STOPPED_ON_SYMLINK when hitting reparse point partway in a path.")
Note that all integers are in Little-Endian.
Specification
You can find a work-in-progress specification document here: https://codeberg.org/SMB3UNIX/smb3_posix_spec
Negotiate Context
SMB2_POSIX_EXTENSIONS 0x100
The data field in the negotiate context MUST be 16 bytes, and contain the following bytes:
\x93\xAD\x25\x50\x9C\xB4\x11\xE7\xB4\x23\x83\xDE\x96\x8B\xCD\x7C
These bytes uniquely identify the SMB3 Posix version.
Create Context
For client requests
New create context. If a file is open with this context, the handle gets POSIX_SEMANTICS flag set.
- Context tag:
SMB2_CREATE_TAG_POSIX "\x93\xAD\x25\x50\x9C\xB4\x11\xE7\xB4\x23\x83\xDE\x96\x8B\xCD\x7C"
- Context payload size: 4 bytes
Unix perm mode to be used for the new file/dir. The bits used are as follow (note the values are in octal):
#define UNIX_X_OTH 0000001 #define UNIX_W_OTH 0000002 #define UNIX_R_OTH 0000004 #define UNIX_X_GRP 0000010 #define UNIX_W_GRP 0000020 #define UNIX_R_GRP 0000040 #define UNIX_X_USR 0000100 #define UNIX_W_USR 0000200 #define UNIX_R_USR 0000400 #define UNIX_STICKY 0001000 #define UNIX_SET_GID 0002000 #define UNIX_SET_UID 0004000
For server responses
The server can respond to CREATE request with this POSIX context too (same context tag).
- Context payload size can vary because of the SID, but the maximum should be 12 + 2*28 = 68 bytes.
u32 SMB_STRUCT_STAT->st_ex_nlink // number of hardlinks u32 FILE_FLAG_REPARSE // "reparse_tag", 0 for regular files, will be used for FIFO, symlinks, etc... u32 unix_perms_to_wire(SMB_STRUCT_STAT->st_ex_mode & ~S_IFMT) sid sid_owner sid sid_group
A sid is encoded as follow. Size can go up to 28 bytes:
u8 sid_rev_num u8 num_auths (range 0-5) buf id_auth (6 bytes) [u32 sub_auth] * num_auths (variable length)
Info level
New info level requestable via GETINFO or FIND. The payload contains a POSIX Create Context response at the end.
- Level value:
SMB2_FIND_POSIX_INFORMATION 0x64
- Payload length: 136.
- 68 + POSIXCreateContextResponse (see above)
u64 put_long_date_timespec(SMB_STRUCT_STAT->st_ex_btime) // birth u64 put_long_date_timespec(SMB_STRUCT_STAT->st_ex_atime) // access u64 put_long_date_timespec(SMB_STRUCT_STAT->st_ex_mtime) // last write u64 put_long_date_timespec(SMB_STRUCT_STAT->st_ex_ctime) // change u64 # bytes used on disk u64 file size u32 dos attributes u64 inode u32 SMB_STRUCT_STAT->st_ex_dev // device ID u32 zero POSIXCreateContextResponse (size=68 bytes)
For FIND (directory listing) there is some extra data at the start (offset to the next directory entry) and the file name at the end:
u32 next_offset u32 ignored POSIXInformation u32 file_name_byte_count utf16 file_name (NOT UTF8!)
POSIX extensions codepaths in samba
SMB2_OP_QUERY_DIRECTORY: smbd_smb2_request_process_query_directory smbd_smb2_query_directory_send smbd_dirptr_lanman2_entry smbd_marshall_dir_entry store_smb2_posix_info <--- sends next_offset + info + posix cc rsp + filename (length + utf16) smb2_posix_cc_info
SMB2_OP_GETINFO: smbd_smb2_getinfo_send smbd_do_qfilepathinfo store_smb2_posix_info <--- sends info + posix cc rsp smb2_posix_cc_info
SMB2_OP_CREATE: smbd_smb2_create_send smbd_smb2_create_after_exec smb2_posix_cc_info <--- sends POSIX create context resp