SMBTA: Difference between revisions

From SambaWiki
(Started adding smbta-chapter)
(Added smbtaquery-functions to wikipage)
Line 631: Line 631:
==== Object Identification ====
==== Object Identification ====
SMBTA will try to identify the objects that given by searching through it’s database. For example, the user "holger" could exist more than once. User "holger" could be coming from two or more domains, each of these listing user "holger" with a different SID. Therefore SMBTA will look up the known SIDs of user "holger" and will present a dialog to choose the right user in case more than one user "holger" exists. As a consequence, shares are checked for domains, and files are checked for the share and the domain.
SMBTA will try to identify the objects that given by searching through it’s database. For example, the user "holger" could exist more than once. User "holger" could be coming from two or more domains, each of these listing user "holger" with a different SID. Therefore SMBTA will look up the known SIDs of user "holger" and will present a dialog to choose the right user in case more than one user "holger" exists. As a consequence, shares are checked for domains, and files are checked for the share and the domain.

==== The functions ====
===== total =====
Syntax: '''total [RW] [R] [W]'''
* The '''total''' function prints out the sum of bytes transferred by an object.
* '''"total r"''' will compute the sum of bytes '''read'''.
* '''"total w"''' will compute the sum of bytes '''written'''.
* '''"total rw"''' will compute the sum of bytes transferred ('''read''' and '''write''' access).

'''Total function example:'''
#Print out the number of bytes written on the share "Fritzboxtest":
'share Fritzboxtest, total w;'

===== list =====
Syntax: '''list [users] [files]'''
* The '''list''' function prints a list of objects related to the given object.
* '''"list users"''' will print out the list of known '''users''' working or involved with the object.
* '''"list files"''' will print out the list of known '''files''' involved with the object.

'''List function example:'''
<pre>
#Prints out all users who ever touched, read or written to file "README":
'file README, list users;'

#Print out the list of known files on share "Fritzboxtest":
'share Fritzboxtest, list files;'
</pre>

===== top =====
Syntax: '''top [number] [files] [users] [shares] [rw] [r] [w] [asc][desc]'''
* The '''top''' function prints out a number of top '''files, users,''' or '''shares''' by '''read, write''' or '''read and write''' access.
* The '''"asc"''' and '''"desc"''' arguments are optional. The sort direction of the function can be changed by adding one of these arguments.

'''Top function example:'''
<pre>
#Print out the 10 most used files on share "Fritzboxtest" by read access:
'share Fritzboxtest, top 10 files r;'

#Print out the 10 most used shares on the Samba network:
'global, top 10 shares rw;'
</pre>

===== last_activity =====
Syntax: '''last_activity [number]'''
* The '''last_activity''' function prints out a list of actions an object has been involved in.

'''last_activity Function example:'''
<pre>
#Print out the last 10 actions of user holger:
'user holger, last_activity 10;'

#Print out the last 10 actions where file "README" has been involved in:
'file README, last_activity 10;'
</pre>

===== usage =====
Syntax: '''usage [rw|r|w]'''
* The '''usage''' function sums up the usage or activity of an object by creating a virtual day, listing 24 hours and showing the average data activity by percentage.

'''usage Function example:'''
<pre>
#Print a graph of the total usage of the Samba network:
'global, usage rw;'

#Print a graph of the usage of share "Fritzboxtest" by read-access:
'share Fritzboxtest, usage r;'
</pre>

===== search =====
Syntax: '''search [term]'''

* The '''search''' function allows a fuzzy search for '''[term]''' over the whole database.
* It allows '''SQL-style wildcards''' like %*. If the function finds data that matches the search string, it will identify the object and shows related data. For example, if it finds a user name that matches to [term], it will identify it as a user, and additionally prints the domain the user is belonging to.

'''Search function example:'''
<pre>
#Fuzzy search for the term disk12:
'global, search %disk12%;'
</pre>

===== throughput =====
Syntax: '''throughput [num] [minutes|days|seconds|hours] [r|rw|w]'''
* The '''throughput''' function calculates the throughput of data of the last '''[num] minutes, day, seconds,''' or '''hours'''.

'''throughput function example:'''
<pre>
#Sum up the read-only throughput of the last 50 days on share "Fritzboxtest".
'share Fritzboxtest, throughput 50 days r;'
</pre>

Revision as of 15:18, 24 July 2012

SMB Traffic Analyzer (SMBTA)

About SMBTA

What is SMB Traffic Analyzer

SMB Traffic Analyzer (from now on called SMBTA) is a software suite utilizing the Samba CIFS server to create statistics about the data traffic on a Samba network. SMBTA is made out of several components:

  • a module in the VFS (Virtual File System) Layer of Samba
  • a system daemon program
  • user oriented programs to visualize the statistics

SMBTA is utilizing a SQL storage to store data about the traffic caused on a network. By making this data available via SQL, SMBTA is helpful for many system administrators.

SMBTA also has real-time interfaces, making it possible to watch Samba traffic in real time on the console or interfacing with rrdtool to create a round robin database.

Having knowledge of SQL is not required. The user may also run the client programs to query the database. These are specialized to the organization of the database, and are taking away much of the work to translate often needed questions into SQL statements from the user. They are also running networked, and can be run on a complete different system where the SQL storage exists.

The basic idea and main concept of SMBTA was born on the SambaXP conference in Göttingen, Germany, in 2007. At the SNIA conference 2008, I was able to present a prototype of the idea to the Samba team. The code for the module got immediately accepted and was introduced with Samba 3.2.0 to Samba users.

This document is refering to the version 2.0 of the SMB Traffic Analyzer VFS module.

Please check SMB Traffic Analyzer’s home page for newer versions of the software or this document.

Why using SMB Traffic Analyzer

Samba services may run on special hardware, like RAID systems, or other specialized storage. With SMBTA, the user can create an overview of the network usage on the network. SMBTA can be used to answer questions like this:

  • Which of my services is the most used one?
  • At which time the usage of my service reaches peaks?
  • Which service almost never gets used?
  • Which users are the highest traffic generators?
  • What where the detailed last file activities on a share?
  • Which is the top used file by read and write access?

Isn’t SMBTA a tool to control employees?

Yes it can be used like that, but doesn’t need to be. To expose user names to a storage is even forbidden in many countries. Therefore, SMBTA allows for anonymization of any data linked to a person. There are many use cases for SMBTA, like determining and planning hardware for your services, or analyzing the network before switching to clustered Samba.

The basic concept of SMBTA

Samba servers configured with the VFS module send meta data to smbtad, which provides database access to the smbtatools programs

The Samba CIFS server features a Virtual File System Layer (VFS), that allows modules to be linked into the layer, replacing or enhancing the functions of the VFS. Modules can be linked into and behave completely transparent. SMBTA makes use of this functionality by adding a module to Samba’s VFS. In the VFS layer, the module is collecting data from prominent VFS functions like write, read, close etc.

The collected data may be be encrypted, and send over the network to a receiver program. This daemon program, smbtad, is then building a SQL storage from the data. Besides of the generated SQL storage, client programs can be used to query the storage in a specialized way over the network.

Where to get SMBTA

The SMBTA VFS module is called vfs_smb_traffic_analyzer.so, and is packaged with Samba version greater or equal than 3.2.0.

Important: The current SMBTA module shipping with Samba (any version older than 3.6.0pre1) is only implementing version 1 of the SMBTA protocol, and this document is only refering to version 2 of the protocol. The implementation of vfs_smb_traffic_analyzer.so protocol version 2 can be found in the current master git source tree of Samba. It is expected to be shipped with Samba 3.6.0.
Tip: For the openSUSE Linux distribution, SMBTAv2 has been backported to current Samba releases, from 3.5.0 and newer. You can retrieve the SMBTAv2 VFS module from the openSUSE BuildService. Furthermore, RPM packages for several openSUSE releases of the smbtatools and the smbtad component are available on the buildservice.

The other parts of SMBTA, the daemon program smbtad, as well as the client programs webSMBTA, smbtaquery and smbtamonitor are available in source code from the projects homepage at http://holger123.wordpress.com/smb-traffic-analyzer/.

Building SMBTA from the sources

The SMBTA tools smbtaquery and smbtamonitor are utilizing "cmake" as the primary build tool. The smbtad daemon program does too.

Before trying to build smbtad and smbtatools, check for the following requirements:

  • cmake ( required for both )
  • libsmbclient library ( required for smbtautils )
  • curses/ncurses library ( required for smbtautils )
  • libDBI ( required for both )

With cmake, you usually do an "out-of-source" build. That means that the source will not be build in the original source directory. Here is an example build. Suppose you’ve unpacked the smbtatools source tarball at (the same procedure can be followed to compile smbtad):

 /home/tom/smbtatools

You can now create your own directory where you want to build smbtatools. Like:

 mkdir /home/tom/smbtatools-build
 cd /home/tom/smbtatools-build

If you have any libraries or required components in other directories than the system paths, you should specify those by specifying environment variables before the cmake run:

 setenv CMAKE_INCLUDE_PATH $YOUR_PATH_TO_EXTERNAL_INCLUDES:$ANOTHER_DIRECTORY_WITH_EXTERNAL_INCLUDES
 setenv CMAKE_LIBRARY_PATH $YOUR_PATH_TO_LIBRARIES:$ANOTHER_DIRECTORY_WITH_LIBRARIES

Both smbtatools and smbtad require sqlite to run. If it isn’t installed on your system, or isn’t available in a version new enough, both packages will build an included sqlite variant and link to this.

You are now in the build directory. Just run cmake from here:

 cmake ../smbtatools/

If you want the later "make install" to install control files for SMBTA into /etc/init.d, set this cmake cache value:

 cmake -D CMAKE_INSTALL_RCFILE:string=yes ../smbtad/

And cmake will check the requirements of smbtatools/smbtad against your system. If it complains, it will return human readable messages about what is missing. After a successful cmake run, you can just do "make" and "make install" as "usual" in this directory.

After a succesful run of cmake, you can run "make" and "make install" ( and "make install" might require root privileges) as usual:

 make
 make install

If you want to install the package to a different prefix, you use:

 make DESTDIR=/$YOUR/$DESTINATION install

Setting up SMBTA

Upgrading from a former version

SMBTA provides an update path to convert databases used with former versions to the current one.

Important: If you are upgrading SMBTA from an older version, it is necessary to run this conversion process!

The smbtaquery program includes a command line option "-C --convert":

 smbtaquery -M $DB-DRIVER -N $DATABASE-NAME -S $DATABASE-USER -H $DATABASE-HOST  -C

This process is self explaining. Basically it presents a menu to choose the version of SMBTA you are coming from. Please consult the chapter about the smbtaquery program to read more about the arguments to the -M, -N, -S and -H options.

About the database

To run SMBTA, you’ll need to setup a database. Current valid databases are:

  • Postgresql, tested succesfully by the SMBTA development team.
  • sqlite3, tested by the SMBTA development team.
  • MySQL, untested by the SMBTA development team, but supported by libDBI. Feedback is highly welcome!

Furthermore, SMBTA will need the following information to access the database:

  • A name of a user that has access to the database
  • The users password
  • The hostname or IP adress of the system that is running the database
  • The name of the database to access.

Initial setup of the database

Important: This step is required if you install SMBTA for the first time!

If you are installing SMBTA for the first time, SMBTA assumes that the administrator has setup an empty database, with a user that has access to it. The smbtad component can then prepare an empty database on an initial run by using "smbtad -T", creating all required tables. See the chapter on smbtad for more about this.

Activating the VFS module on Samba Servers

In general, the Samba CIFS server is configured via smb.conf, it’s main configuration file. A typical service definition would look like this:

 [ExampleShare]
         path = /space
         read only = no

Defining the share "ExampleShare" as a CIFS service, refering to the path /space on the servers storage.

Internet socket

By extending the share definition with the commands to load the SMBTA VFS module, SMBTA logging for this service can be activated:

 [ExampleShare]
         path = /space
         read only = no
         vfs objects = smb_traffic_analyzer
         smb_traffic_analyzer:protocol_version = V2
         smb_traffic_analyzer:host = localhost
         smb_traffic_analyzer:port = 3490
Important: By default, SMBTA will use protocol version 1, to not break any existing installations. Only if the version parameter is given, it will be forced to use protocol version 2. This documentation is only refering to protocol version 2.

The host and port parameters define the hostname of the system that will be used as the target for the VFS modules data to send. It will try to connect to the port number given with the parameter port.

Unix domain socket

The module can also run using a unix domain socket, which is useful if you want to connect smbtad on the same system as the Samba server. In this case The share definition would look like this:

  [ExampleShare]
         path = /space
         read only = no
         vfs objects = smb_traffic_analyzer
         smb_traffic_analyzer:protocol_version = V2
         smb_traffic_analyzer:mode = unix_domain_socket

The mode parameter set to "unix_domain_socket" then will use a unix domain socket at /var/tmp/stadsocket.

Anonymization

Anonymization of user related information can be enabled in the VFS module. There are two modes of anonymization possible.

The first method is created by generating a hash number out of the username, and add the number to a given prefix in the configuration. Such, a user called "holger" might become "user123", if "user" was the given prefix.

 [ExampleShare]
         path = /space
         read only = no
         vfs_objects = smb_traffic_analyzer
         smb_traffic_analyzer:protocol_version = V2
         smb_traffic_analyzer:mode = unix_domain_socket
         # enable prefix based anonymization
         smb_traffic_analyzer:anonymize_prefix = user

The second method is totally anonymizing any users. It just replaces the username (and the SID) to a given prefix.

 [ExampleShare]
         path = /space
         read only = no
         vfs_objects = smb_traffic_analyzer
         smb_traffic_analyzer:protocol_version = V2
         smb_traffic_analyzer:mode = unix_domain_socket
         # enable total anonymization mapped to "user"
         smb_traffic_analyzer:anonymize_prefix = user
         smb_traffic_analyzer:total_anonymization = yes


Setting up smbtad

On the target host of the module, the program smbtad is running. It’s task is mainly to feed a SQL storage out of the data it receives from the module, maintain the storage and accept requests from clients.

 smbtad -i 3490 -p 3491

smbtad is a daemon program that can be run by any user. By default, it creates a directory '$HOME/.smbtad/' where it stores it’s database. The above call makes smbtad to listen to the VFS modules connections on port 3490, and listen to client connections on port 3491.

By default, smbtad creates a database as '$HOME/.smbtad/staddb'. If the database already exists, it will use the existing database.

The command line options of smbtad

Any option you give that smbtad doesn’t understand will lead to print out it’s list of processed commands. Such as if the user calls:

 smbtad --help

The following output will appear:

  SMB Traffic Analyzer daemon version 1.2.4
  License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
  (C) 2008-2011 Holger Hetterich <ozzy@metal-district.de>
  -S      --dbuser                Specifiy the user for the database.
  -H      --dbhost                Specifiy the host of the database.
  -P      --dbpassword            Specifiy to password to access the db.
  -M      --dbdriver              Specify the libDBI driver to use.
  -N      --dbname                Specify the name of the database.
  -i      --inet-port             Specifiy the port to be used.
                                  Default: 3490.
  -u      --unix-domain-socket    If this parameter is specified,
                                  a unix domain socket at
                                  /var/tmp/stadsocket will be
                                  used.
  -d      --debug-level           Specify the debug level (0-10).
                                  Default: 0.
  -q      --query-port            Port to be used for clients.
  -o      --interactive           Don't run as daemon.
                                  (Runs as daemon by default)
  -c      --config-file           Use configuration file given.
  -t --maintenance-timer <value>  specify the time intervall to
                                  to start the database
                                  maintenance routine. Format is
                                  HH:MM:SS
                                  Example: -m 00:30:00 will run
                                  the maintenance routine every
                                  half hour.
                                  Default: 01:00:00
  -m --maintenance-timer-config
       <value>                    specify a number of days
                                  and a time. Every database
                                  entry which is older than the
                                  the specified number of days
                                  will be deleted by the
                                  maintenance routine.
                                  Format is: DAYS, HH:MM:SS
                                  Default: 1,00:00:00
  -k --keyfile                    Keyfile for encryption to be used
                                  between module and smbtad.
  -p --precision                  Precision value for the build-in
                                  cache. Default is 5.
  -U --use-db                     Specify 0 or 1 as argument. If
                                  this is 0, no database handling
                                  will be done. Default is 1.
  -T --setup                      Do the initial database setup and exit.
  • -i --inet-port: Specifiy the internet socket port, that smbtad is using to listen for data from the VFS module. If the port number is not given, the Default setting 3490 will be used.
  • -d --debug-level: Specify the debug level when running smbtad. If you get smbtad crashing and want to produce a bug report, please run it with -d 10, which is the highest debug level. The default value for this setting is 0, only fatal errors will be reported in this case. The debug messages smbtad is producing will be consumed by syslog and thus appear in your system log.
  • -o --interactive: For debugging reasons, it can be useful to not run smbtad as a daemon program. When run with -o, smbtad will not become a daemon program. The default is to run in daemon mode.
  • -c --config-file: The user can provide a configuration file instead of providing the command line switches.
  • -q --query-port: This is the internet socket port to which clients can connect when those want to request real time information. In case this parameter is not given, it is set to 3491 (or in other words to the value of --i --inet-port + 1.
  • -t --maintenance-timer: To hinder the database from growing infinite, a maintenance process is included in smbtad, which will clean up the database after a certain rule (see -m --maintenance-timer-config). The -t option specifies the interval in hours, as to when the maintenance process should be run. For example, if you give 00:10:00, a maintenance process will be started every ten minutes.
  • -m --maintenance-timer-config: Here you can specify how the maintenance process should work. The format is DAYS, HH:MM:SS. For example, if you want to delete anything in the database that is older than one day, you would give 1,00:00:00 (which is the default). Would you like to delete anything in the database that is older than 2 weeks, then you would give: 14,00:00:00 as parameter. Having deleted everything that is older than 2 hours, would be 0,02:00:00 as parameter.
  • -u --unix-domain-socket: If this option is given, a unix domain socket under /var/tmp/stadsocket will be used for the connection to the VFS module. This is useful if smbtad is to be run on the same machine that runs the Samba server.
  • -n --unix-domain-socket-cl: If this option is given, a unix domain socket under /var/tmp/stadsocket_client will be used for the connection to clients such as smbtaquery. This is useful if smbtad is to be run on the same machine that runs the tools.

New options since 1.2.3:

  • -p --precision: As of version 1.2.3 of smbtad, the program will sum up similar VFS R/W entries in it’s cache and only store the sum of transferred bytes instead of every single VFS entry. It’s cache is insert-sort based and the summing up makes the resulting database much smaller than in former releases. Because of this interpolation, statistic results will get slightly unprecise. The -p/--precision argument allows the user to control the timespan in seconds, over which the cache is summing up similar VFS R/W entries. The default is set to 5 seconds. The more seconds the user is specifying, the smaller will the resulting database be, and the less precise will the results be, when queried through smbtaquery. To do real data-mining, this argument can be set to 0. In this mode, smbtad isn’t summing up any VFS entry, and stores them as is in the database. This is the behaviour of smtbad in former releases than 1.2.3.
  • -U --use_db: As of version 1.2.3 of smbtad, the database usage of smbtad can be completely shut down. This is useful when SMBTA is only to be used with rrddriver or smbtamonitor, which only rely on real time data. The option requires an integer argument that is either 0 or 1. If it is 1, which is the default, the database will be handled, if it’s 0, only real time data is supported. Please note, that under --use-db=0 circumstances, some monitor functionality such as the TOTAL monitor, which is supposed to transfer the TOTAL sum of transfered bytes of an object will have a different behaviour. As the database is not available, it cannot query for the initial sum and starts with 0. Also, the real time tools rrddriver and smbtamonitor are doing an identification procedure by default, which must be shut down by using the --I/--identify switch with the tools. If you want to use SMBTA for rrddriver or smbtamonitor only, it is recommended to set this option to 0.

New options for database-settings with libDBI since 1.2.4:

  • -S --dbuser: Specifies the user of the database, and must be a valid user of the database.
  • -H --dbhost: Specifies the host name of the system where the database runs.
  • -P --dbpassword: Specifies the password of the user (given with -S --dbuser to access the database.
  • -M --dbdriver: Specifies the driver to use for the database connection.
    • Database drivers are:
      • "pgsql" for Postgresql.
      • "mysql" for MySQL (untested).
      • "sqlite3" for Sqlite3 (untested).
  • -N --dbname: Specifies the name of the database to be used for the database connection.
  • -T --setup: As of version 1.2.4 of smbtad, the database needs to be setup by the -T --setup switch. This will access the database, and create the initial tables and structure of the database. smbtad will not daemonize after this call, just return normally.
  • -I --ip: This command line switch specifies the interface or network address to bind smbtad to when in network operation. It can either be a IPv4 address, a IPv6 address, or a fully qualified hostname which will be resolved by smbtad. If this option is not given, it’ll default to "localhost".


Using a configuration file with smbtad

All the options mentioned in the paragraph before can be configured with a configuration file. The configuration file has the same format as ini-Files, known from the Windows platform. An example configuration file is included in the smbtad package, in the /dist directory. We will go through a complete configuration file in this chapter, descriping all the options. The configuration file is separated by chapters, such as "general" or "network". Line beginning with # are considered comments.

# The general section defines options causing
# changes for the whole application
[general]
        # The debug level defines the verbosity in syslog
        # of smbtad. Values from 0 to 10 are supported, 0
        # being the normal mode, and 10 being totally verbose.
        # Note that debug level 10 is causing a speed penalty.
        # If you think you've found a bug, and try to reproduce it,
        # please run smbtad with debug_level = 10.
        debug_level = 0

        # use_db is the equivalent to the -U --use-db command line
        # argument.
        # if use_db is 0, any handling of the database will not be
        # done.
        # This is useful if SMBTA is only to be used with rrdriver
        # or smbtamonitor, and thus only relies on real time data.
        # Default is 1.
        use_db = 1

[network]
        # The smbtad_ip option the network address / interface that
        # smbtad should bind to for network operations.
        # it can either be:
        # A full IPv4 Address, such as:
        # smbtad_ip = 192.168.178.23
        # A full IPv6 Address, such as:
        # smbtad_ip = ::ffff:192.168.178.31
        # Or a full hostname, such as:
        # smbtad_ip = smbtad.host.de
        # (In this case smbtad will check the host for it's ip address and
        #  uses this.)
        #
        # If the option is not given, it will default to "localhost"
        smbtad_ip = localhost

        # The query_port option defines the internet socket port
        # to be used for talking to clients such as
        # smbtamonitor.
        query_port = 3491

        # The unix_domain_socket option specifies wether a unix domain
        # socket is used for the connection to the VFS module.
        # It's arguments are either yes, or no.
        unix_domain_socket = no

        # The unix_domain_socket_clients option specifies wether a unix
        # domain socket is used for the connection to real-time clients
        unix_domain_socket_clients = no


[database]
        # The "name" option specifies the name of the database to be used
        # for smbtad. The database must have been prepared by using
        # smbta -T.
        name = dbname

        #
        # The "host" option specifies the name or IP adress of the system
        # running the database
        host = examplehost.test.ex

        #
        # The "driver" option specifies the name of the database driver
        # to use to access the database. Valid values are
        # pgsql (postgresql), mysql (mysql), or sqlite3 (sqlite)
        driver = pgsql

        #
        # The "user" option specifies the name of the user to use to
        # access the database.
        user = testuser

        #
        # The "password" option specifies the password of the user given
        # with the "user" option.
        password = testpassword

[maintenance]
        # To hinder the database from growing infinite, a maintenance process is
        # included in smbtad. The option "interval" is telling the intervall
        # as to when the maintenance procedure is run. For example, if you
        # give "00:10:00" as argument, a maintenance procedure will be run
        # any ten minutes.
        interval = 01:00:00

        # The config parameter defines how the maintenance process should work.
        # The format is DAYS, HH:MM:SS. For example, if you want to delete
        # anything in the database that is older than one day, you would
        # give 1,00:00:00 (which is the default). Would you like to delete
        # anything in the database that is older than 2 weeks, then you would
        # give: 14,00:00:00 as parameter. Having deleted everything that
        # is older than 2 hours, would be 0,02:00:00 as parameter.
        config = 01,00:00:00

Controlling smbtad through the rcsmbtad script

The smbtad distribution includes LSB compliant start/stop scripts, as well as to check the availability of the service. Provided you have a configuration file setup for smbtad in /etc/smbtad.conf, the scripts will use this as the configuration for smbtad. smbtad then can be

  • started with:
 rcsmbtad start
  • stopped with:
 rcsmbtad stop
  • and checked for availability with:
 rcsmbtad status

Checking the installation

When everything is set up, move some data on the shares you have activated the VFS module on. For example, if you write a file to a share, you can check the installation by looking into the database directly, involving the sqlite3 command line interface:

 sqlite3 /var/lib/staddb
 $ select * from write;

You should now see entries of data written by the user you have done the transfer with. If this works, your SMBTA installation works fine und you can start pointing more shares to it.

Using the client programs

Using a configuration file with the client programs

All client programs described in the following sections can make use of a configuration file. Since any of the tools have quite similar options and parameters, a single configuration file can be used by default which is located in

 $(HOME)/./smbtatools/smbtatools.config

Following is a sample configuration file, describing all possible options. For further questions to these options, please see the chapter about the program you are refering to below.

[general]
# In general, the programs are more verbose by running them
# at debug level 10. debug level 0 is the default
# debug_level = 10

[network]
# The "smbta_port_number" option is used for all real-time
# programs that are using a direct connection to smbtad,
# such as smbtamonitor or rrddriver.
# It is the port number on which smbtad is listening for
# incoming client connections
smbta_port_number = 3491

# The "smbta_host" options is used as the host to connect
# to that has smbtad running. This option is used for all
# real-time programs that require a direct connection
# to smbtad, such as smbtamonitor or rrddriver.
smbta_host = examplehost.ex.di

# The "unix_domain_socket" option is used for all real-time
# programs that are using a direct connection to smbtad,
# such as smbtamonitor or rrddriver.
# If this option is set to something, such as "yes",
# the connection to smbtad will be done by a unix domain socket
# instead of a internet socket.
# The default behaviour is to use an internet socket, and
# if this option is not given, it won't be used.
# unix_domain_socket = yes

[database]

# The "host" parameter is used for smbtaquery. It's argument
# is the hostname or ip-address of the system that is running
# the database.
host = dbhost.example.ex

# The "user" parameter is used for smbtaquery. It's argument
# is the name of the user on the database system to use for
# doing queries.
user = testuser

# The "password" parameter is used for smbtaquery. It's argument
# is the password of the user given in the "user" parameter.
password = password

# The "driver" parameter is used for the libDBI driver to be
# used to make the connection to, and specifies the database
# that is being used. Valid values are:
# pgsql (Postgresql), mysql (MySQL), sqlite3 (Sqlite3)
driver = pgsql

# The "name" parameter is used for the name of the database
# to connect to.
name = smbta-database

smbtaquery

The possibilities to create statistical data out of the data set are quite wide ranged. Instead of querying the database directly via SQL, users may use smbtaquery. It is specialized on the setup of the database,and runs a lot of preconfigured queries to produce statistical and informational data. smbtaquery is utilizing a simplified language to operate on the users whish, translates the commands into SQL, and connects to the smbtad program to run the SQL query. The result is then printed out.

Requirements

smbtaquery produces XML to represent it’s results. To convert the XML data to the desired output format, the xsltproc XSL processor is being used, and smbtaquery expects it on the system in /usr/bin/xsltproc. xsltproc is part of the libxslt package ( http://xmlsoft.org/XSLT/ ).

smbtaquery comes with stylesheets telling the xsltproc program how to translate the XML raw data to other formats. It is expecting these stylesheets in the directory /usr/share/smbtatools/.

Tip: If the stylesheets are installed at a different place, set the SMBTATOOLS_DATA_PATH environment variable to the path where the stylesheets reside:
 export SMBTATOOLS_DATA_PATH=$YOUR/$PATH/$TO/$THE/$DATA
Tip: If the location of the xsltproc program is different from /usr/bin/, you can tell smbtaquery by setting the SMBTATOOLS_XSLTPROC_PATH environment variable, such as:
 export SMBTATOOLS_XSLTPROC_PATH=$YOUR/$PATH/$TO/$XSLTPROC
Tip: If you don’t intend to use the xsltproc program, don’t have it available, or want to use a different XSL processor, you can generate raw xml output by calling smbtaquery with the -x $FILE option. The resulting XML output will be written to $FILE. It can then be processed by a different XSL processor.

smbtaquery command line arguments

If smbtaquery is called without enough arguments or wrong arguments, it’ll bring it’s list of available command line options as a reaction:

holger@linux-lm9n:~/Dev/smbtatools/run> ./smbtaquery
ERROR: not enough arguments.

smbtaquery version 1.2.3-devel
(C)opyright 2011 by Benjamin Brunner
(C)opyright 2011 by Michael Haefner
(C)opyright 2011 by Holger Hetterich
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>

-M      --dbdriver <str>        Set the libDBI database driver to use.
-N      --dbname <str>          Set the name of the database to open.
-S      --dbuser <str>          Set the name of the user for the database.
-H      --dbhost <str>          Set the hostname to connect to.
-P      --dbpassword <str>      Set the password to use.
-d      --debug-level <num>     Set the debug level to work
                                with to <num>. Default: 0
-c      --config-file <file>    Load the configuration from
                                a file given as <file>.
-q      --query                 Run an interpreter command,
                                or run a SQL select command.
-p      --command-help          Interpreter command description.
-f      --file <file>           Read the commands from a file.
                                connect to smbtad.
-x      --xml <file>            Output XML to file <file>.
-o      --output                Specify the format to output.
                                Default: ascii
-k      --keyfile <file>        Enable encryption and load the
                                key from <file>.
-K      --create-key <file>     Create a key for encryption in <file>.
-I      --identify <num>        0 = don't run identification,
                                1 = run idendification (default)
-C      --convert               run an interactive conversion/update
                                process to convert an older database
                                to this version of SMBTA.
-t      --test-db               Dry run; only try to connect to
                                the database and report the result
                                to the terminal.
  • -M --dbdriver: Takes a string as argument to specify the driver to use to access the database. Valid choices are: pgsql (Postgresql) mysql (MySQL) sqlite3 (Sqlite)
  • -N --dbname: Takes a string as argument to specifiy the name of the database to be used.
  • -S --dbuser: Takes a string as argument to specifiy the name of the user who is accessing the database.
  • -H --dbhost: Takes a string as argument to specify the hostname of the system that is running the database or the IP Address of that system.
  • -P --dbpassword: Takes a string as argument to specify the password for the user to access the database.
  • -d --debuglevel: The debuglevel can be set here, ranging from 1 to 10. If you want to run smbtaquery more verbose, such as seeing the SQL queries smbtaquery actually runs, set this to 10.
  • -c --configfile: A configuration file can be set here, the argument takes a string. If this parameter is given, smbtaquery will load the configuration file, which overrides anything given on the command line.
  • -q --query: This command line argument is used to run a query, in the language the smbtaquery interpreter provides. See the next sections for more details on this function.
  • -f --file: Instead of taking input for a query from the -q --query command, smbtaquery can work sequentially through a file, see the next sections for more details on this function.
  • -x --xml: smbtaquery in general produces xml output, that is automatically interpreted by libxslt to generate the preferred output format. If you want to force smbtaquery to write it’s xml output into a file, use this option, and provide the filename as argument.
  • -o --output: Through XML processing, smbtaquery allows different output formats to choose. Currently, HTML and ascii text is supported. If this option is not given, the default will be "ascii". Current valid options are: ascii - ascii text output html - html output
  • -K --create-key: This option takes a string as argument, and creates a 128Bit AES key to encrypt data transfer. The string given represents the file to be created with the key. This keyfile can be used with rrddriver or smbtamonitor to encrypt data flow with smbtad.
  • -I --identify: This option takes an integer as argument and is either 0 or 1. Given with 0, smbtaquery will not try to identify the objects given in a query. This can be useful, for example when smbtaquery is called from webSMBTA, which always comes with a set of pre-identified objects. If this option is set to 1, which is the default if the option is not given, smbtaquery will try to identify an object any time. See the next sections for more details on this function.
  • -C --convert: If this option is given, smbtaquery will not go into normal operation, but instead provides an interactive session with the aim to convert databases from one version to the current one.
  • -t --test-db: If this option is given, smbtaquery tries to connect to the database, reports the result and exits. It can be used to quickly check if the configuration is fine.

General usage

Following is an example of a smbtaquery run:

 smbtaquery -H databasehost.example.ex -U dbuser -P dbpasswd -N smbtad -M pgsql -q 'global, total rw;'

Telling smbtaquery to connect to a postgresql database as user dbuser with password dbpasswd, on host databasehost.example.ex. Finally the command line option -q (or --query) provides the query to run for the build in interpreter of smbtaquery. More on this follows in the later chapters.

Output modes, saving of results to a file

By default, smbtaquery prints it’s results to the terminal it is running on. To print the results to a file, redirect the output to a file, such as:

 smbtaquery -q 'global, usage rw;' > test.txt

smbtaquery is able to produce different output formats. Currently, ascii text output (which is the default setting), XML and HTML is implemented. The output format can be specified with the -o $FORMAT option:

 smbtaquery -u -q 'global, usage rw;' -o html > test.html

The generated file "test.html" can then be loaded into a webbrowser for viewing.

The smbtaquery interpreter

To ease the usage of the database, smbtaquery implements a simplified language to query the database. It is specialized for SMBTA. There are two ways to run the buildin interpreter of smbtaquery. Either by using the -q switch on the command line, and run a single line of commands, or by specifying a file containing the commands to run.

In the following example of an smbtaquery call, the -q parameter is used:

 smbtaquery -q 'global, total r;'

When using a file to read in commands, the -f Parameter is used. It’s argument is the filename of the file to run commands from:

 smbtaquery -f testfile.txt

Any line beginning with # in the file, will be ignored as comments. At any new line in the file, one or more objects have to be defined. Here is an example file:

# Short example
# Show the global usage of the Samba network
global, usage rw;
#
# Show the sum of bytes written by user 'holger'
user holger, total w;

The commands specified in the file will be run in order given in the file.

Syntax

Every command is seperated by a comma, any paraeters to the command are seperated by space. The last command ends with a semicolon instead of a comma.

Here is an example query:

 'global, usage rw;'
The objects
Important: Any functions called by smbtaquery are in relation to an object which is named first. An object can be a file, a share, or a username, a domain, or global, which does not relate to a specific object.

For example if you make a query like this:

 'share Fritzboxtest, usage rw'

The calling of the function "usage rw" (as being introduced later in this chapter) is bound to the share "Fritzboxtest". Any results will be related to this share. The general syntax is first to name the object, and then to run the functions on.

Say "total r" is a function to get the total number of bytes read by an object. A query could then be:

 'user holger, total r;'

This call would print the total number of bytes read by the user holger.

By defining more objects in a row, the interpreter conjuncts those objects with AND. For example:

 'file README, share pool, total r;'

In this example we query for the file README on the share pool, and then run the total function on it.

The following object definitions are currently possible:

'share $SHARE' a Samba file service (a share)
'user $USER' a User
'file $FILE' a specific file
'global' Global.
'domain $DOMAIN' a domain.

The "global" parameter is special in the way it treats the following functions. Functions running on global are not limited by anything, and always work on the full database. Giving the following query:

 'global, total rw;'

Would print out the full sum of bytes ever transferred (be it read or write).

Time Modifiers

Any functions that are called on objects can make use of time modifiers, that allow to specifiy a time range the function shall work on. The supported keywords must be used on objects and are called: FROM - TO and SINCE.

Example for FROM-TO:

global from 2011-05-25 to today, total rw;

Example for SINCE:

user holger since yesterday, total rw;

Allowed keywords for time specification are: $DATE-$TIME in the fowolling format: YYYY-MM-DD HH:MM:SS

Also, there are the following keywords making the time specification easier: yesterday, today and now.

Object Identification

SMBTA will try to identify the objects that given by searching through it’s database. For example, the user "holger" could exist more than once. User "holger" could be coming from two or more domains, each of these listing user "holger" with a different SID. Therefore SMBTA will look up the known SIDs of user "holger" and will present a dialog to choose the right user in case more than one user "holger" exists. As a consequence, shares are checked for domains, and files are checked for the share and the domain.

The functions

total

Syntax: total [RW] [R] [W]

  • The total function prints out the sum of bytes transferred by an object.
  • "total r" will compute the sum of bytes read.
  • "total w" will compute the sum of bytes written.
  • "total rw" will compute the sum of bytes transferred (read and write access).

Total function example:

 #Print out the number of bytes written on the share "Fritzboxtest":
 'share Fritzboxtest, total w;'
list

Syntax: list [users] [files]

  • The list function prints a list of objects related to the given object.
  • "list users" will print out the list of known users working or involved with the object.
  • "list files" will print out the list of known files involved with the object.

List function example:

  #Prints out all users who ever touched, read or written to file "README":
  'file README, list users;'

  #Print out the list of known files on share "Fritzboxtest":
  'share Fritzboxtest, list files;'
top

Syntax: top [number] [files] [users] [shares] [rw] [r] [w] [asc][desc]

  • The top function prints out a number of top files, users, or shares by read, write or read and write access.
  • The "asc" and "desc" arguments are optional. The sort direction of the function can be changed by adding one of these arguments.

Top function example:

  #Print out the 10 most used files on share "Fritzboxtest" by read access:
  'share Fritzboxtest, top 10 files r;'

  #Print out the 10 most used shares on the Samba network:
  'global, top 10 shares rw;'
last_activity

Syntax: last_activity [number]

  • The last_activity function prints out a list of actions an object has been involved in.

last_activity Function example:

  #Print out the last 10 actions of user holger:
  'user holger, last_activity 10;'

  #Print out the last 10 actions where file "README" has been involved in:
  'file README, last_activity 10;'
usage

Syntax: usage [rw|r|w]

  • The usage function sums up the usage or activity of an object by creating a virtual day, listing 24 hours and showing the average data activity by percentage.

usage Function example:

  #Print a graph of the total usage of the Samba network:
  'global, usage rw;'

  #Print a graph of the usage of share "Fritzboxtest" by read-access:
  'share Fritzboxtest, usage r;'
search

Syntax: search [term]

  • The search function allows a fuzzy search for [term] over the whole database.
  • It allows SQL-style wildcards like %*. If the function finds data that matches the search string, it will identify the object and shows related data. For example, if it finds a user name that matches to [term], it will identify it as a user, and additionally prints the domain the user is belonging to.

Search function example:

  #Fuzzy search for the term disk12:
  'global, search %disk12%;'
throughput

Syntax: throughput [num] [minutes|days|seconds|hours] [r|rw|w]

  • The throughput function calculates the throughput of data of the last [num] minutes, day, seconds, or hours.

throughput function example:

  #Sum up the read-only throughput of the last 50 days on share "Fritzboxtest".
  'share Fritzboxtest, throughput 50 days r;'