Asking for Help
The best place to ask for help with Linux CIFS is on the linux-cifs mailing list. When asking for help, it's best to provide some basic info:
- The kernel version you're using (the output of uname -r)
- The mount.cifs version you're using (mount.cifs -V)
- A clear, concise description of the problem
- A description of the CIFS server with which you're having trouble (Windows version if it's windows, samba version if it's samba, name of the appliance if it's something else)
- if you're able to mount the host, get the contents of /proc/fs/cifs/DebugData
The CIFS code contains a number of debugging statements that can be enabled. If you ask for help on the list, one of the developers may ask you for this info. You can also turn it on on your own, but it's not generally helpful unless you're willing to dig into the code.
To enable debugging, echo a non-zero value into /proc/fs/cifs/cifsFYI. For example:
# modprobe cifs # echo 'module cifs +p' > /sys/kernel/debug/dynamic_debug/control # echo 'file fs/cifs/* +p' > /sys/kernel/debug/dynamic_debug/control # echo 7 > /proc/fs/cifs/cifsFYI
To disable it:
# echo 0 > /proc/fs/cifs/cifsFYI
These messages end up in the kernel ring buffer. You can view them using dmesg.
syslog will generally also pick up much of it, but if the rate of messages is rather large, syslog tends to drop some of them. Getting the info straight out of the ring buffer is generally preferred since that's lossless.
This debugging however can be rather chatty and have a significant impact on performance. It's often best to use this with easily reproducible problems. That is:
- turn on debugging
- reproduce the issue
- turn off debugging
Debugging info can contain sensitive data like IP addresses and filenames. Take care when sending this information.
It's sometimes helpful to capture wire traffic between the client and server. The easiest way to do this is with wireshark which is a graphical network analysis tool. In many cases however, it's not easy or possible to run wireshark directly on one of the hosts. In that case, it's often easier to capture the network traffic in binary format to a file and then feed it into an analyzer to look over it. That also makes it possible to send it to someone who can do some analysis on it.
Here's an example of doing this:
# tcpdump -i eth0 -s0 -w /tmp/cifs-traffic.pcap host cifs_server.example.com and port 445
or alternatively if this is a large capture, and you want to limit the size to a reasonable maximum (200 bytes) try:
# tcpdump -i eth0 -s200 -w /tmp/cifs-traffic.pcap host cifs_server.example.com and port 445
...of course, tcpdump has a lot of options, so these are just an example. In particular you'll want to modify the capture filter depending on what machine you're running the capture on, etc... An excellent overview presentation describing using wireshark to trace SMB workloads can be found at https://www.snia.org/sites/default/orig/sdc_archives/2008_presentations/monday/RonnieSahlberg_UsingWireshark.pdf
The captured traffic in this above example will be in /mnt/cifs-traffic.pcap. Before sending these around, it's a good idea to compress them as they squash down fairly well.
In general, the SMB protocol can be fairly chatty so it's best to use this in a similar manner to the debugging above:
- start the capture
- reproduce the problem
- stop the capture
Wire captures can also contain sensitive data like addresses, password hashes, filenames and data. Be careful to whom you send it. In general, don't send this to mailing lists unless you know that the data isn't sensitive.
Occasionally the kernel will panic. When it does, it's helpful to capture the entire message including the kernel messages leading up to the oops. There's a lot of info in an oops message but the main thing that helps debugging is determining where the machine panicked. Here's one way to do this:
Save off the oops message. The main thing that you see in there is a dump of the registers on the CPU that panicked. For instance, an oops on a 32-bit ix86 machine might look something like this:
BUG: unable to handle kernel NULL pointer dereference at 00000414 IP: [<c110d057>] cifs_writepages+0x35/0x60a
...the "IP:" line refers to the instruction pointer. That tells us what instruction the CPU was executing at the time that it panicked. The problem is though that due to architecture and compiler differences, etc, we can't directly turn that into a line of code. Here's how to do that:
Open the kernel module with gdb:
$ gdb cifs.ko
...eventually it should come to a (gdb) prompt. If you're running a vendor kernel, then you may need debuginfo packages for this to work. Once you get a gdb prompt, run:
(gdb) list *(cifs_writepages+0x35)
...obviously, you should replace the stuff in the parenthesis with whatever your oops message says. Pasting the list output can help developers help you.
It can be helpful to know whether the client timed out and had to reconnect to the server (and even if it reconnected successfully if not using the "hard" mount option, this could cause some pending commands to fail). Check the value of "session" and "share reconnects" in /proc/fs/cifs/Stats ("cat /proc/fs/cifs/Stats | grep reconnect") before the failure and again after the failure to see if they have increased. If the value of "session reconnects" and/or "share reconnects" has increased, that indicates that an operation has timed out (sometimes due to a network failure, or a server or file system hang, or other bug). In addition "dmesg" (the kernel message buffer) will often show a message similar to the following "CIFS VFS: Server 172.22.149.109 has not responded in 120 seconds. Reconnecting …"
Additional Debugging Features Coming Soon!
Support for trace-cmd (ftrace) for cifs.ko is in the next version of Linux and will be available in the 4.18 kernel. This will allow selective control of cifs tracepoints via trace-cmd ("trace-cmd record -e cifs") or /sys/kernel/debug/tracing/events/cifs