Difference between revisions of "Cifs.ko-testing"

From SambaWiki
(add page to cifs.ko category)
(2 intermediate revisions by the same user not shown)
Line 253: Line 253:
=== iptables ===
=== iptables ===
* Simulate server ( abruptly going down by dropping all packets we receive from it:
<pre>iptables -I INPUT -s -j DROP</pre>
* Restore connection to it by allowing server packets again:
<pre>iptables -I INPUT -s -j ACCEPT</pre>
=== Faking network delay with tc ===
Sometime adding 'large' network delays can trigger bugs or make reproducing race condition easier. You can fake it on an interface with tc (Traffic Control).
'''NOTE''': requires <code>CONFIG_NET_SCH_NETEM=y</code>
* Adding 300ms on eth0
<pre>tc qdisc add dev eth0 root netem delay 300ms</pre>
* Removing it
<pre>tc qdisc del dev eth0 root</pre>
== Can you mount a sub-path? ==
== Can you mount a sub-path? ==

Latest revision as of 13:56, 31 May 2021

It would be nice to have a good testing environment for cifs.ko.

  • Something to check for regressions after backporting something.
  • But also to debug relatively quickly while working on a fix or feature.


Ideally we want to be able to:

  • Test cifs.ko (for-next, stable branches also?) against multiple server implementations including Windows servers.
  • Test multi-machine configuration. For example domain based DFS configurations might require 3 machines on the server side.
  • Test multiple mount options.
  • Get a network trace of the exchange.
  • Get kernel console output, including crashes, oopses, coredumps to later analyze with crash utility
  • Simulate network failure (unplugging network interface or dropping packets from server)
  • Detect kernel hangups and crashes somewhat gracefully.

What to test

Here's a broad checklist for what to test after adding/changing something. Try to make sure the behaviour doesn't change (in a bad way) and combine the parameters (yes that's potentially a lot of testing). You can probably not test for everything if you have some knowledge of what the code is changing.

Can you mount different servers?

  • upstream samba master
  • Windows Server
  • samba from different vendors (SUSE, Redhat, ...) (at least the latest product)

Can you use different SMB versions?

mount options:

  • (nothing) - use default version
  • vers=1.0 - SMB1
  • vers=2.1 - SMB2
  • vers=3.0 - SMB3
  • vers=3.11 - SMB3.1.1

Can you use encryption?

Encryption can be global (all session traffic encrypted) or per-share (all in-tree traffic encrypted).

Encryption is not supported for SMB1 (there is a samba extension for it but deprecated). The client and the server can have these states:

  • Unsupported (off): the client/server doesn't support encryption.
  • Supported (on): the client/server supports encryption but doesn't require it (unencrypted by default)
  • Required (req): the client/server must fail if the other end cannot use it.
off on req
server off N N F
on N ? Y
req F Y Y
  • F: fail, connection should not be made, error should be reported
  • N: connection is made unencrypted
  • Y: conncetion is made encrypted
  • ?: not sure, i think here it SHOULD encrypt but not mandatory


  • samba smb.conf: smb encrypt = off|enabled|required (per share or global)
  • cifs.ko: supported by default, use seal mount option to make it required.
  • Windows: New-SmbShare -encryptdata $true -name myshare -path c:\dir
    • Since you need to login (session setup) to begin to connect to a share there is some overlap between the share level encryption and server level encryption. See what MS has to say.

Can you use signing?

Controlled by sec= mount option and /proc/fs/cifs/SecurityFlags. Like encryption it can be unsupported, enabled or required by the server/client.

  • sec=ntlmsspi to force signing when providing credentials (non-kerberos).
  • sec=krb5i to force signing when using kerberos tickets.

  • samba: server signing = disabled|auto|mandatory. With SMB2 signing cannot be off/disabled.
  • Windows: should be on by default.

Can you login using kerberos?


On the client you need to use the AD DNS server to resolve domain and hosts properly:

  • Either directly in /etc/resolv.conf
  • Or as extra dnsmasq rule server=/foo.com/ This will use this IP DNS server for everything on *.foo.com.
  • Setup the kerberos configuration (/etc/krb5.conf)
dns_lookup_realm = true
dns_lookup_kdc = true
forwardable = true
default_realm = FOO.COM

    kdc = FILE:/var/log/krb5/krb5kdc.log
    admin_server = FILE:/var/log/krb5/kadmind.log
    default = FILE:/var/log/krb5/def.log
  • Get kerberos ticket
kinit aaptel@FOO.COM
  • Make sure the tickets are there
$ klist
Ticket cache: DIR::/run/user/1000/krb5cc/tkt
Default principal: aaptel@FOO.COM

Valid starting       Expires              Service principal
04/16/2018 15:08:33  04/17/2018 01:08:33  krbtgt/FOO.COM@FOO.COM
        renew until 04/17/2018 15:08:32
04/16/2018 15:08:33  04/17/2018 01:08:33  cifs/foo-ad.foo.com@FOO.COM
        renew until 04/17/2018 15:08:32
  • Try to login with smbclient first
smbclient //foo-ad.foo.com/share -k
  • For debugging use:
KRB5_TRACE=/dev/stderr smbclient //foo-ad.foo.com/share -k
  • Make sure you /etc/request-key.conf has an entry for cifs upcalls and DNS resolving.
create  dns_resolver *          *               /sbin/key.dns_resolver %k
create  cifs.spnego     *       *               /usr/sbin/cifs.upcall %k
  • Finally mount with -o sec=krb5,cruid=aaptel. Make sure to use the kerberos username via cruid.

Common issues

  • Make sure you use ALLCAPS for the domain name in /etc/krb5.conf and in kinit command.
  • Make sure you use the right hostname when accessing the server (foo.com vs foo-ad.foo.com).
  • Kerberos requires the client and server clock to be in sync. Sync with NTP on the client or something.

Does the reconnection code works?

The network can fail on multiple levels.

Great explanation of some aspects of the reconnection code on Sachin Prabhu blog. (archive link if it ever dies).

Relevant mount option:

  sets the interval at which echo requests are sent to the server on an
  idling connection. This setting also affects the time required for a
  connection to an unresponsive server to timeout. Here n is the echo
  interval in seconds. The reconnection happens at twice the value of the
  echo_interval set for an unresponsive server.
  If this option is not given then the default value of 60 seconds is used.
  The minimum tunable value is 1 second and maximum can go up to 600 seconds.
  • Try to have opened files with reconnecting, it is supposed to reopen then transparently.
  • Try to disconnect/reconnect at various and different times of the cifs.ko lifetime

QEMU unplugging

Use QEMU monitor console to plug/unplug the network cable with set_link <iface> <on|off> and wait for cifs.ko timeout to elapse.

When using QEMU nographic mode with the serial console connected to the terminal:

# hit Ctrl-a Ctrl-c to toggle between serial console and QEMU monitor shell
# you can use TAB to list/complete ifaces

(qemu) set_link network0 off
(qemu) [   43.392267] e1000: eth0 NIC Link is Down

# first the keepalive thread doing echo request fails
# NOTE: this is not always where it fails first

[   66.787917] fs/cifs/smb2pdu.c: In echo request
[   66.788507] __smb_send_rqst: 15 callbacks suppressed
[   66.788508] fs/cifs/transport.c: Sending smb: smb_len=68
[  126.944605] fs/cifs/smb2pdu.c: In echo request
[  126.945411] fs/cifs/smb2pdu.c: Echo request failed: -11
[  126.946310] fs/cifs/connect.c: Unable to send echo request to server: foo-ad.foo.com
[  133.838917] CIFS VFS: Server foo-ad.foo.com has not responded in 120 seconds. Reconnecting...

# reconnection started 

[  133.839949] fs/cifs/connect.c: Reconnecting tcp session
[  133.840565] fs/cifs/connect.c: cifs_reconnect: marking sessions and tcons for reconnect
[  133.841386] fs/cifs/connect.c: cifs_reconnect: tearing down socket
[  133.842036] fs/cifs/connect.c: State: 0x3 Flags: 0x0
[  133.842588] fs/cifs/connect.c: Post shutdown state: 0x3 Flags: 0x0
[  133.843235] fs/cifs/connect.c: cifs_reconnect: moving mids to private list
[  133.843933] fs/cifs/connect.c: cifs_reconnect: issuing mid callbacks
[  133.844593] cifs_small_buf_release: 14 callbacks suppressed
[  133.844594] fs/cifs/misc.c: Null buffer passed to cifs_small_buf_release
[  133.845862] fs/cifs/connect.c: Socket created
[  133.846327] fs/cifs/connect.c: sndbuf 16384 rcvbuf 87380 rcvtimeo 0x1b58
[  135.342594] fs/cifs/connect.c: Error -113 connecting to server
[  135.343345] fs/cifs/connect.c: reconnect error -113
[  138.348603] fs/cifs/connect.c: Socket created
[  138.349374] fs/cifs/connect.c: sndbuf 16384 rcvbuf 87380 rcvtimeo 0x1b58
[  138.356107] fs/cifs/connect.c: Error -113 connecting to server
[  138.357099] fs/cifs/connect.c: reconnect error -113
[  141.362706] fs/cifs/connect.c: Socket created
[  141.363265] fs/cifs/connect.c: sndbuf 16384 rcvbuf 87380 rcvtimeo 0x1b58
[  141.422423] fs/cifs/connect.c: Error -113 connecting to server
[  141.423149] fs/cifs/connect.c: reconnect error -113
[  144.424365] fs/cifs/connect.c: Socket created
[  144.424929] fs/cifs/connect.c: sndbuf 16384 rcvbuf 87380 rcvtimeo 0x1b58

# fails since no cable.. lets plug it back in

(qemu) set_link network0 on
(qemu) [  147.432481] fs/cifs/connect.c: Error -113 connecting to server
[  147.433237] fs/cifs/connect.c: reconnect error -113
[  149.836864] e1000: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
[  150.434434] fs/cifs/connect.c: Socket created
[  150.434988] fs/cifs/connect.c: sndbuf 16384 rcvbuf 87380 rcvtimeo 0x1b58
[  150.460966] fs/cifs/smb2pdu.c: In echo request
[  150.461541] fs/cifs/smb2pdu.c: Need negotiate, reconnecting tcons
[  150.462285] fs/cifs/smb2pdu.c: Negotiate protocol
[  150.462851] fs/cifs/transport.c: Sending smb: smb_len=102
[  150.522457] cifs_demultiplex_thread: 15 callbacks suppressed
[  150.522459] fs/cifs/connect.c: RFC1002 header 0xf8
[  150.523582] smb2_check_message: 15 callbacks suppressed
[  150.523584] fs/cifs/smb2misc.c: smb2_check_message length: 0xfc, smb_buf_length: 0xf8
[  150.524923] smb2_calc_size: 10 callbacks suppressed
[  150.524924] fs/cifs/smb2misc.c: SMB2 data length 120 offset 128
[  150.526050] smb2_calc_size: 17 callbacks suppressed
[  150.526051] fs/cifs/smb2misc.c: SMB2 len 252
[  150.526987] cifs_sync_mid_result: 15 callbacks suppressed
[  150.526994] fs/cifs/transport.c: cifs_sync_mid_result: cmd=0 mid=0 state=4
[  150.528260] fs/cifs/misc.c: Null buffer passed to cifs_small_buf_release
[  150.528942] fs/cifs/smb2pdu.c: mode 0x3
[  150.529347] fs/cifs/smb2pdu.c: negotiated smb3.0 dialect

# ..yadda yadda..


  • use tcpkill to just change the TCP connection state


  • Simulate server ( abruptly going down by dropping all packets we receive from it:
iptables -I INPUT -s -j DROP
  • Restore connection to it by allowing server packets again:
iptables -I INPUT -s -j ACCEPT

Faking network delay with tc

Sometime adding 'large' network delays can trigger bugs or make reproducing race condition easier. You can fake it on an interface with tc (Traffic Control).


  • Adding 300ms on eth0
tc qdisc add dev eth0 root netem delay 300ms
  • Removing it
tc qdisc del dev eth0 root

Can you mount a sub-path?

Try mounting //SERVER/SHARE/sub/path.

Can you mount a DFS share?

DFS is when you use some kind of inter-server symlinks. When cifs.ko taverses a link (while mounting OR while changing directory in one) it automatically connects to the destination of the link. You have a root host storing links to others hosts. cifs.ko can mount root hosts, subpath on root hosts, links, subpath in links.

  • Try mounting a DFS setup where you have 2 servers and a link from the first to the second.
  • Try to mount:
    • //SERVERA/DFSROOT/ and ls && cd link && ls && cd path && ls
    • //SERVERA/DFSROOT/link and ls && cd path && ls
    • //SERVERA/DFSROOT/link/path/ and ls
  • Try mounting a domain-based DFS setup where you have 3 servers (one extra indirection):
    • Domain host (A) with no file shares
    • Namespace server (B)
    • Storage server (C) with the final file shares

cifs.ko should:

  • connect to (A) IPC share
  • send a DFS query on it
  • get (B) in the results
  • connect to (B) IPC share
  • send a DFS query on it
  • get (C) in the results
  • connect to (C) IPC share
  • send DFS query on it
  • get nothing back
  • ...continue regular mount procedure from (C)...



Single host configuration (link pointing to same server):

mkdir /tmp/dfsroot /tmp/dfstarget
cd /tmp/dfsroot && ln -s 'msdfs:\\<samba host ip>\dfstarget' link


host msdfs = yes

path = /tmp/dfsroot
msdfs root = yes

path = /tmp/dfstarget

Does the xfstests test suite reports regressions?

Despite its name (xfstests), there is a generic filesystem testsuite developed on kernel.org. It tests for low-level things.

Samba setup

Here's a sample samba setup you can use as a base and tweak.

  • Create a directory layout for the shares
mkdir -p /tmp/shares/{dfsroot,dfstarget/sub,encrypt_off,encrypt_on,encrypt_req}
touch /tmp/shares/dfsroot/in_root /tmp/shares/dfstarget/{in_target,sub/in_target_sub}
cd /tmp/shares/dfsroot

# use your ip here..
export SERVER_IP=""
ln -s 'msdfs:\'$SERVER_IP'\dfstarget' link
ln -s 'msdfs:\'$SERVER_IP'\dfstarget\sub' linksub

# if you dont have a user already
root$ smbpasswd -a $user
  • Use this smb.conf as a base, tweak while you test

server min protocol = NT1
server max protocol = SMB3_11

# yes|no
unix extensions = yes

# session encryption (global)
# off|enabled|required
smb encrypt = default

# disabled|auto|mandatory
server signing = default

# enable DFS subsystem
host msdfs = yes

path = /tmp/shares/dfsroot
msdfs root = yes

path = /tmp/shares/dfstarget

# per share encryption (make one of each)

# off|enabled|required
path = /tmp/shares/encrypt_off
smb encrypt = off

path = /tmp/shares/encrypt_on
smb encrypt = enabled

path = /tmp/shares/encrypt_req
smb encrypt = required