SMB Traffic Analyzer (SMBTA)
What is SMB Traffic Analyzer
SMB Traffic Analyzer (from now on called SMBTA) is a software suite utilizing the Samba CIFS server to create statistics about the data traffic on a Samba network. SMBTA is made out of several components:
- a module in the VFS (Virtual File System) Layer of Samba
- a system daemon program
- user oriented programs to visualize the statistics
SMBTA is utilizing a SQL storage to store data about the traffic caused on a network. By making this data available via SQL, SMBTA is helpful for many system administrators.
SMBTA also has real-time interfaces, making it possible to watch Samba traffic in real time on the console or interfacing with rrdtool to create a round robin database.
Having knowledge of SQL is not required. The user may also run the client programs to query the database. These are specialized to the organization of the database, and are taking away much of the work to translate often needed questions into SQL statements from the user. They are also running networked, and can be run on a complete different system where the SQL storage exists.
The basic idea and main concept of SMBTA was born on the SambaXP conference in Göttingen, Germany, in 2007. At the SNIA conference 2008, I was able to present a prototype of the idea to the Samba team. The code for the module got immediately accepted and was introduced with Samba 3.2.0 to Samba users.
This document is refering to the version 2.0 of the SMB Traffic Analyzer VFS module.
Please check SMB Traffic Analyzer’s home page for newer versions of the software or this document.
Why using SMB Traffic Analyzer
Samba services may run on special hardware, like RAID systems, or other specialized storage. With SMBTA, the user can create an overview of the network usage on the network. SMBTA can be used to answer questions like this:
- Which of my services is the most used one?
- At which time the usage of my service reaches peaks?
- Which service almost never gets used?
- Which users are the highest traffic generators?
- What where the detailed last file activities on a share?
- Which is the top used file by read and write access?
Isn’t SMBTA a tool to control employees?
Yes it can be used like that, but doesn’t need to be. To expose user names to a storage is even forbidden in many countries. Therefore, SMBTA allows for anonymization of any data linked to a person. There are many use cases for SMBTA, like determining and planning hardware for your services, or analyzing the network before switching to clustered Samba.
The basic concept of SMBTA
The Samba CIFS server features a Virtual File System Layer (VFS), that allows modules to be linked into the layer, replacing or enhancing the functions of the VFS. Modules can be linked into and behave completely transparent. SMBTA makes use of this functionality by adding a module to Samba’s VFS. In the VFS layer, the module is collecting data from prominent VFS functions like write, read, close etc.
The collected data may be be encrypted, and send over the network to a receiver program. This daemon program, smbtad, is then building a SQL storage from the data. Besides of the generated SQL storage, client programs can be used to query the storage in a specialized way over the network.
Where to get SMBTA
The SMBTA VFS module is called vfs_smb_traffic_analyzer.so, and is packaged with Samba version greater or equal than 3.2.0.
The other parts of SMBTA, the daemon program smbtad, as well as the client programs webSMBTA, smbtaquery and smbtamonitor are available in source code from the projects homepage at http://holger123.wordpress.com/smb-traffic-analyzer/.
Building SMBTA from the sources
The SMBTA tools smbtaquery and smbtamonitor are utilizing "cmake" as the primary build tool. The smbtad daemon program does too.
Before trying to build smbtad and smbtatools, check for the following requirements:
- cmake ( required for both )
- libsmbclient library ( required for smbtautils )
- curses/ncurses library ( required for smbtautils )
- libDBI ( required for both )
With cmake, you usually do an "out-of-source" build. That means that the source will not be build in the original source directory. Here is an example build. Suppose you’ve unpacked the smbtatools source tarball at (the same procedure can be followed to compile smbtad):
You can now create your own directory where you want to build smbtatools. Like:
mkdir /home/tom/smbtatools-build cd /home/tom/smbtatools-build
If you have any libraries or required components in other directories than the system paths, you should specify those by specifying environment variables before the cmake run:
setenv CMAKE_INCLUDE_PATH $YOUR_PATH_TO_EXTERNAL_INCLUDES:$ANOTHER_DIRECTORY_WITH_EXTERNAL_INCLUDES setenv CMAKE_LIBRARY_PATH $YOUR_PATH_TO_LIBRARIES:$ANOTHER_DIRECTORY_WITH_LIBRARIES
Both smbtatools and smbtad require sqlite to run. If it isn’t installed on your system, or isn’t available in a version new enough, both packages will build an included sqlite variant and link to this.
You are now in the build directory. Just run cmake from here:
If you want the later "make install" to install control files for SMBTA into /etc/init.d, set this cmake cache value:
cmake -D CMAKE_INSTALL_RCFILE:string=yes ../smbtad/
And cmake will check the requirements of smbtatools/smbtad against your system. If it complains, it will return human readable messages about what is missing. After a successful cmake run, you can just do "make" and "make install" as "usual" in this directory.
After a succesful run of cmake, you can run "make" and "make install" ( and "make install" might require root privileges) as usual:
make make install
If you want to install the package to a different prefix, you use:
make DESTDIR=/$YOUR/$DESTINATION install
Setting up SMBTA
Upgrading from a former version
SMBTA provides an update path to convert databases used with former versions to the current one.
The smbtaquery program includes a command line option "-C --convert":
smbtaquery -M $DB-DRIVER -N $DATABASE-NAME -S $DATABASE-USER -H $DATABASE-HOST -C
This process is self explaining. Basically it presents a menu to choose the version of SMBTA you are coming from. Please consult the chapter about the smbtaquery program to read more about the arguments to the -M, -N, -S and -H options.
About the database
To run SMBTA, you’ll need to setup a database. Current valid databases are:
- Postgresql, tested succesfully by the SMBTA development team.
- sqlite3, tested by the SMBTA development team.
- MySQL, untested by the SMBTA development team, but supported by libDBI. Feedback is highly welcome!
Furthermore, SMBTA will need the following information to access the database:
- A name of a user that has access to the database
- The users password
- The hostname or IP adress of the system that is running the database
- The name of the database to access.
Initial setup of the database
If you are installing SMBTA for the first time, SMBTA assumes that the administrator has setup an empty database, with a user that has access to it. The smbtad component can then prepare an empty database on an initial run by using "smbtad -T", creating all required tables. See the chapter on smbtad for more about this.
Activating the VFS module on Samba Servers
In general, the Samba CIFS server is configured via smb.conf, it’s main configuration file. A typical service definition would look like this:
[ExampleShare] path = /space read only = no
Defining the share "ExampleShare" as a CIFS service, refering to the path /space on the servers storage.
By extending the share definition with the commands to load the SMBTA VFS module, SMBTA logging for this service can be activated:
[ExampleShare] path = /space read only = no vfs objects = smb_traffic_analyzer smb_traffic_analyzer:protocol_version = V2 smb_traffic_analyzer:host = localhost smb_traffic_analyzer:port = 3490
The host and port parameters define the hostname of the system that will be used as the target for the VFS modules data to send. It will try to connect to the port number given with the parameter port.
Unix domain socket
The module can also run using a unix domain socket, which is useful if you want to connect smbtad on the same system as the Samba server. In this case The share definition would look like this:
[ExampleShare] path = /space read only = no vfs objects = smb_traffic_analyzer smb_traffic_analyzer:protocol_version = V2 smb_traffic_analyzer:mode = unix_domain_socket
The mode parameter set to "unix_domain_socket" then will use a unix domain socket at /var/tmp/stadsocket.
Anonymization of user related information can be enabled in the VFS module. There are two modes of anonymization possible.
The first method is created by generating a hash number out of the username, and add the number to a given prefix in the configuration. Such, a user called "holger" might become "user123", if "user" was the given prefix.
[ExampleShare] path = /space read only = no vfs_objects = smb_traffic_analyzer smb_traffic_analyzer:protocol_version = V2 smb_traffic_analyzer:mode = unix_domain_socket # enable prefix based anonymization smb_traffic_analyzer:anonymize_prefix = user
The second method is totally anonymizing any users. It just replaces the username (and the SID) to a given prefix.
[ExampleShare] path = /space read only = no vfs_objects = smb_traffic_analyzer smb_traffic_analyzer:protocol_version = V2 smb_traffic_analyzer:mode = unix_domain_socket # enable total anonymization mapped to "user" smb_traffic_analyzer:anonymize_prefix = user smb_traffic_analyzer:total_anonymization = yes
Setting up smbtad
On the target host of the module, the program smbtad is running. It’s task is mainly to feed a SQL storage out of the data it receives from the module, maintain the storage and accept requests from clients.
smbtad -i 3490 -p 3491
smbtad is a daemon program that can be run by any user. By default, it creates a directory '$HOME/.smbtad/' where it stores it’s database. The above call makes smbtad to listen to the VFS modules connections on port 3490, and listen to client connections on port 3491.
By default, smbtad creates a database as '$HOME/.smbtad/staddb'. If the database already exists, it will use the existing database.
The command line options of smbtad
Any option you give that smbtad doesn’t understand will lead to print out it’s list of processed commands. Such as if the user calls:
The following output will appear:
SMB Traffic Analyzer daemon version 1.2.4 License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> (C) 2008-2011 Holger Hetterich <firstname.lastname@example.org> -S --dbuser Specifiy the user for the database. -H --dbhost Specifiy the host of the database. -P --dbpassword Specifiy to password to access the db. -M --dbdriver Specify the libDBI driver to use. -N --dbname Specify the name of the database. -i --inet-port Specifiy the port to be used. Default: 3490. -u --unix-domain-socket If this parameter is specified, a unix domain socket at /var/tmp/stadsocket will be used. -d --debug-level Specify the debug level (0-10). Default: 0. -q --query-port Port to be used for clients. -o --interactive Don't run as daemon. (Runs as daemon by default) -c --config-file Use configuration file given. -t --maintenance-timer <value> specify the time intervall to to start the database maintenance routine. Format is HH:MM:SS Example: -m 00:30:00 will run the maintenance routine every half hour. Default: 01:00:00 -m --maintenance-timer-config <value> specify a number of days and a time. Every database entry which is older than the the specified number of days will be deleted by the maintenance routine. Format is: DAYS, HH:MM:SS Default: 1,00:00:00 -k --keyfile Keyfile for encryption to be used between module and smbtad. -p --precision Precision value for the build-in cache. Default is 5. -U --use-db Specify 0 or 1 as argument. If this is 0, no database handling will be done. Default is 1. -T --setup Do the initial database setup and exit.
- -i --inet-port: Specifiy the internet socket port, that smbtad is using to listen for data from the VFS module. If the port number is not given, the Default setting 3490 will be used.
- -d --debug-level: Specify the debug level when running smbtad. If you get smbtad crashing and want to produce a bug report, please run it with -d 10, which is the highest debug level. The default value for this setting is 0, only fatal errors will be reported in this case. The debug messages smbtad is producing will be consumed by syslog and thus appear in your system log.
- -o --interactive: For debugging reasons, it can be useful to not run smbtad as a daemon program. When run with -o, smbtad will not become a daemon program. The default is to run in daemon mode.
- -c --config-file: The user can provide a configuration file instead of providing the command line switches.
- -q --query-port: This is the internet socket port to which clients can connect when those want to request real time information. In case this parameter is not given, it is set to 3491 (or in other words to the value of --i --inet-port + 1.
- -t --maintenance-timer: To hinder the database from growing infinite, a maintenance process is included in smbtad, which will clean up the database after a certain rule (see -m --maintenance-timer-config). The -t option specifies the interval in hours, as to when the maintenance process should be run. For example, if you give 00:10:00, a maintenance process will be started every ten minutes.
- -m --maintenance-timer-config: Here you can specify how the maintenance process should work. The format is DAYS, HH:MM:SS. For example, if you want to delete anything in the database that is older than one day, you would give 1,00:00:00 (which is the default). Would you like to delete anything in the database that is older than 2 weeks, then you would give: 14,00:00:00 as parameter. Having deleted everything that is older than 2 hours, would be 0,02:00:00 as parameter.
- -u --unix-domain-socket: If this option is given, a unix domain socket under /var/tmp/stadsocket will be used for the connection to the VFS module. This is useful if smbtad is to be run on the same machine that runs the Samba server.
- -n --unix-domain-socket-cl: If this option is given, a unix domain socket under /var/tmp/stadsocket_client will be used for the connection to clients such as smbtaquery. This is useful if smbtad is to be run on the same machine that runs the tools.
New options since 1.2.3:
- -p --precision: As of version 1.2.3 of smbtad, the program will sum up similar VFS R/W entries in it’s cache and only store the sum of transferred bytes instead of every single VFS entry. It’s cache is insert-sort based and the summing up makes the resulting database much smaller than in former releases. Because of this interpolation, statistic results will get slightly unprecise. The -p/--precision argument allows the user to control the timespan in seconds, over which the cache is summing up similar VFS R/W entries. The default is set to 5 seconds. The more seconds the user is specifying, the smaller will the resulting database be, and the less precise will the results be, when queried through smbtaquery. To do real data-mining, this argument can be set to 0. In this mode, smbtad isn’t summing up any VFS entry, and stores them as is in the database. This is the behaviour of smtbad in former releases than 1.2.3.
- -U --use_db: As of version 1.2.3 of smbtad, the database usage of smbtad can be completely shut down. This is useful when SMBTA is only to be used with rrddriver or smbtamonitor, which only rely on real time data. The option requires an integer argument that is either 0 or 1. If it is 1, which is the default, the database will be handled, if it’s 0, only real time data is supported. Please note, that under --use-db=0 circumstances, some monitor functionality such as the TOTAL monitor, which is supposed to transfer the TOTAL sum of transfered bytes of an object will have a different behaviour. As the database is not available, it cannot query for the initial sum and starts with 0. Also, the real time tools rrddriver and smbtamonitor are doing an identification procedure by default, which must be shut down by using the --I/--identify switch with the tools. If you want to use SMBTA for rrddriver or smbtamonitor only, it is recommended to set this option to 0.
New options for database-settings with libDBI since 1.2.4:
- -S --dbuser: Specifies the user of the database, and must be a valid user of the database.
- -H --dbhost: Specifies the host name of the system where the database runs.
- -P --dbpassword: Specifies the password of the user (given with -S --dbuser to access the database.
- -M --dbdriver: Specifies the driver to use for the database connection.
- Database drivers are:
- "pgsql" for Postgresql.
- "mysql" for MySQL (untested).
- "sqlite3" for Sqlite3 (untested).
- Database drivers are:
- -N --dbname: Specifies the name of the database to be used for the database connection.
- -T --setup: As of version 1.2.4 of smbtad, the database needs to be setup by the -T --setup switch. This will access the database, and create the initial tables and structure of the database. smbtad will not daemonize after this call, just return normally.
- -I --ip: This command line switch specifies the interface or network address to bind smbtad to when in network operation. It can either be a IPv4 address, a IPv6 address, or a fully qualified hostname which will be resolved by smbtad. If this option is not given, it’ll default to "localhost".
Using a configuration file with smbtad
All the options mentioned in the paragraph before can be configured with a configuration file. The configuration file has the same format as ini-Files, known from the Windows platform. An example configuration file is included in the smbtad package, in the /dist directory. We will go through a complete configuration file in this chapter, descriping all the options. The configuration file is separated by chapters, such as "general" or "network". Line beginning with # are considered comments.
# The general section defines options causing # changes for the whole application [general] # The debug level defines the verbosity in syslog # of smbtad. Values from 0 to 10 are supported, 0 # being the normal mode, and 10 being totally verbose. # Note that debug level 10 is causing a speed penalty. # If you think you've found a bug, and try to reproduce it, # please run smbtad with debug_level = 10. debug_level = 0 # use_db is the equivalent to the -U --use-db command line # argument. # if use_db is 0, any handling of the database will not be # done. # This is useful if SMBTA is only to be used with rrdriver # or smbtamonitor, and thus only relies on real time data. # Default is 1. use_db = 1 [network] # The smbtad_ip option the network address / interface that # smbtad should bind to for network operations. # it can either be: # A full IPv4 Address, such as: # smbtad_ip = 192.168.178.23 # A full IPv6 Address, such as: # smbtad_ip = ::ffff:192.168.178.31 # Or a full hostname, such as: # smbtad_ip = smbtad.host.de # (In this case smbtad will check the host for it's ip address and # uses this.) # # If the option is not given, it will default to "localhost" smbtad_ip = localhost # The query_port option defines the internet socket port # to be used for talking to clients such as # smbtamonitor. query_port = 3491 # The unix_domain_socket option specifies wether a unix domain # socket is used for the connection to the VFS module. # It's arguments are either yes, or no. unix_domain_socket = no # The unix_domain_socket_clients option specifies wether a unix # domain socket is used for the connection to real-time clients unix_domain_socket_clients = no [database] # The "name" option specifies the name of the database to be used # for smbtad. The database must have been prepared by using # smbta -T. name = dbname # # The "host" option specifies the name or IP adress of the system # running the database host = examplehost.test.ex # # The "driver" option specifies the name of the database driver # to use to access the database. Valid values are # pgsql (postgresql), mysql (mysql), or sqlite3 (sqlite) driver = pgsql # # The "user" option specifies the name of the user to use to # access the database. user = testuser # # The "password" option specifies the password of the user given # with the "user" option. password = testpassword [maintenance] # To hinder the database from growing infinite, a maintenance process is # included in smbtad. The option "interval" is telling the intervall # as to when the maintenance procedure is run. For example, if you # give "00:10:00" as argument, a maintenance procedure will be run # any ten minutes. interval = 01:00:00 # The config parameter defines how the maintenance process should work. # The format is DAYS, HH:MM:SS. For example, if you want to delete # anything in the database that is older than one day, you would # give 1,00:00:00 (which is the default). Would you like to delete # anything in the database that is older than 2 weeks, then you would # give: 14,00:00:00 as parameter. Having deleted everything that # is older than 2 hours, would be 0,02:00:00 as parameter. config = 01,00:00:00
Controlling smbtad through the rcsmbtad script
The smbtad distribution includes LSB compliant start/stop scripts, as well as to check the availability of the service. Provided you have a configuration file setup for smbtad in /etc/smbtad.conf, the scripts will use this as the configuration for smbtad. smbtad then can be
- started with:
- stopped with:
- and checked for availability with:
Checking the installation
When everything is set up, move some data on the shares you have activated the VFS module on. For example, if you write a file to a share, you can check the installation by looking into the database directly, involving the sqlite3 command line interface:
sqlite3 /var/lib/staddb $ select * from write;
You should now see entries of data written by the user you have done the transfer with. If this works, your SMBTA installation works fine und you can start pointing more shares to it.
Using the client programs
Using a configuration file with the client programs
All client programs described in the following sections can make use of a configuration file. Since any of the tools have quite similar options and parameters, a single configuration file can be used by default which is located in
Following is a sample configuration file, describing all possible options. For further questions to these options, please see the chapter about the program you are refering to below.
[general] # In general, the programs are more verbose by running them # at debug level 10. debug level 0 is the default # debug_level = 10 [network] # The "smbta_port_number" option is used for all real-time # programs that are using a direct connection to smbtad, # such as smbtamonitor or rrddriver. # It is the port number on which smbtad is listening for # incoming client connections smbta_port_number = 3491 # The "smbta_host" options is used as the host to connect # to that has smbtad running. This option is used for all # real-time programs that require a direct connection # to smbtad, such as smbtamonitor or rrddriver. smbta_host = examplehost.ex.di # The "unix_domain_socket" option is used for all real-time # programs that are using a direct connection to smbtad, # such as smbtamonitor or rrddriver. # If this option is set to something, such as "yes", # the connection to smbtad will be done by a unix domain socket # instead of a internet socket. # The default behaviour is to use an internet socket, and # if this option is not given, it won't be used. # unix_domain_socket = yes [database] # The "host" parameter is used for smbtaquery. It's argument # is the hostname or ip-address of the system that is running # the database. host = dbhost.example.ex # The "user" parameter is used for smbtaquery. It's argument # is the name of the user on the database system to use for # doing queries. user = testuser # The "password" parameter is used for smbtaquery. It's argument # is the password of the user given in the "user" parameter. password = password # The "driver" parameter is used for the libDBI driver to be # used to make the connection to, and specifies the database # that is being used. Valid values are: # pgsql (Postgresql), mysql (MySQL), sqlite3 (Sqlite3) driver = pgsql # The "name" parameter is used for the name of the database # to connect to. name = smbta-database