Configuring clustered Samba

Revision as of 06:31, 2 June 2007 by Sahlberg (talk | contribs) (STATD_SHARED_DIRECTORY should be in the nfs and not ctdb sysconf file)

Setting up a simple CTDB Samba cluster

As of April 2007 you can setup a simple Samba3 or Samba4 CTDB cluster, running either on loopback (with simulated nodes) or on a real cluster with TCP. This page will tell you how to get started.

Clustering Model

The setup instructions on this page are modelled on setting up a cluster of N nodes that function in nearly all respects as a single multi-homed node. So the cluster will export N IP interfaces, each of which is equivalent (same shares) and which offers coherent CIFS file access across all nodes.

The clustering model utilizes IP takeover techniques to ensure that the full set of public ip addresses assigned to services on the cluster will always be available to the clients even when some nodes have failed and become unavailable.


Getting the code

You need two source trees, one is a copy of Samba3 with clustering patches, and the other is the ctdb code itself. Both source trees are stored in bzr repositories. See http://bazaar-vcs.org/ for more information on bzr.

The fastest way to checkout an initial copy of the Samba3 tree with clustering patches is:

  rsync -avz samba.org::ftp/unpacked/samba_3_0_ctdb .

To update this tree when improvements are made in the upstream code do this:

   cd samba_3_0_ctdb
   bzr merge http://samba.org/~tridge/samba_3_0_ctdb

If you don't have bzr and can't easily install it, then you can instead use the following command to update your tree to the latest version:

   cd samba_3_0_ctdb
   rsync -avz samba.org::ftp/unpacked/samba_3_0_ctdb/ .

Volker Lendecke maintains his own tree that sometimes has later changes in it. To merge from Volkers tree use this command:

   bzr merge http://www.samba.sernet.de/vl/bzr/3_0-ctdb/

Generally the two trees will only be a day or so apart, but Samba/ctdb is undergoing fast development at the moment, so one day can include quite a few changes.

To get an initial checkout of the ctdb code do this:

  rsync -avz samba.org::ftp/unpacked/ctdb .

To update this tree when improvements are made in the upstream code do this:

   cd ctdb
   bzr merge http://samba.org/~tridge/ctdb

If you don't have bzr and can't easily install it, then you can instead use the following command to update your tree to the latest version:

   cd ctdb
   rsync -avz samba.org::ftp/unpacked/ctdb/ .


Building the Samba3 tree

To build a copy of Samba3 with clustering and ctdb support you should do this:

   cd samba_3_0_ctdb/source
   ./autogen.sh
   ./configure --prefix=/gpfs0/samba/prefix --with-ctdb=/usr/src/ctdb --with-cluster-support --enable-pie=no
   make proto
   make

You should replace the /gpfs0/samba/prefix path with the cluster shared storage path you will use to install Samba. The path should to be a directory that is the same on all nodes of the cluster. If you are setting up a virtual cluster on loopback then this can be any local directory.

The /usr/src/ctdb path should be replaced with the path to the ctdb sources that you downloaded above

Building the CTDB tree

To build a copy of the CTDB code you should do this:

  cd ctdb
  ./autogen.sh
  ./configure --prefix=/gpfs0/samba/prefix
  make
  make install

Installing Samba3

To install Samba3 you should do this:

 cd samba_3_0_ctdb/source
 make install

If your path points to another version of Samba, it is recommended that you reset your path to point to the bin/ and sbin/ directories of this newer Samba installation (e.g. /gpfs0/samba/prefix/bin and /gpfs0/samba/prefix/sbin). Then you need to configure an appropriate smb.conf. There is a very simple example in samba_3_0_ctdb/examples/ctdb. You need to put this smb.conf in the lib/ subdirectory of the prefix you chose above.

Next you need to initialise the Samba password database, e.g.

 smbpasswd -a root

or if you have not reset your path to point to this newer version of Samba:

 /gpfs0/samba/prefix/bin/smbpasswd -a root

Samba with clustering must use the tdbsam or ldap SAM passdb backends (it must not use the default smbpasswd backend). The rest of the configuration of Samba is exactly as it is done on a normal system. See the docs on http://samba.org/ for details.

CTDB Cluster Configuration

These are the primary configuration files for CTDB

/etc/sysconfig/ctdb

# Options to ctdbd. This is read by /etc/init.d/ctdb
#
# the NODES file must be specified or ctdb won't start
# it should contain a list of IPs that ctdb will use
# it must be exactly the same on all cluster nodes
# defaults to /etc/ctdb/nodes
# NODES=/etc/ctdb/nodes
#
# the directory to put the local ctdb database files in
# defaults to /var/ctdb
# DBDIR=/var/ctdb
#
# the script to run when ctdb needs to ask the OS for help,
# such as when a IP address needs to be taken or released
# defaults to /etc/ctdb/events
# EVENT_SCRIPT=/etc/ctdb/events
#
# the location of the local ctdb socket
# defaults to /tmp/ctdb.socket
# CTDB_SOCKET=/tmp/ctdb.socket
#
# what transport to use. Only tcp is currently supported
# defaults to tcp
# TRANSPORT="tcp"
#
# should ctdb do IP takeover? If it should, then specify a file
# containing the list of public IP addresses that ctdb will manage
# Note that these IPs must be different from those in $NODES above
# there is no default
# PUBLIC_ADDRESSES=/etc/ctdb/public_addresses
#
# when doing IP takeover you also must specify what network interface
# to use for the public addresses 
# there is no default
# PUBLIC_INTERFACE=eth0
#
# where to log messages
# the default is /var/log/log.ctdb
# LOGFILE=/var/log/log.ctdb
#
# what debug level to run at. Higher means more verbose
# the default is 0
# DEBUGLEVEL=0
#
# use this to specify any local tcp ports to wait on before starting
# ctdb. Use 445 and 139 for Samba
# the default is not to wait for any local services
# CTDB_WAIT_TCP_PORTS="445 139"
#
# any other options you might want. Run ctdbd --help for a list
# CTDB_OPTIONS=

/etc/ctdb/nodes

This file needs to be created as /etc/ctdb/nodes and contains a list of the private IP addresses that the CTDB daemons will use in your cluster. This should be a private non-routable subnet which is only used for CTDB traffic. This file must be the same on all nodes in the cluster.

Example :

10.1.1.1
10.1.1.2
10.1.1.3
10.1.1.4

/etc/ctdb/public_addresses

This file is only required if you plan to use IP takeover. In order to use IP takeover you must specify which interface to use in /etc/sysconfig/ctdb by specifying the PUBLIC_INTERFACE variable. You must also specify the list of public IP addresses to use in this file.

This file contains a list (one for each node) of public cluster addresses. these are the addresses that the SMBD daemons will bind to. This file must contain one address for each node, i.e. it must have the same number of entries as the nodes file.

Example:

192.168.1.1/24
192.168.1.2/24
192.168.2.1/24
192.168.2.2/24

These are the IP addresses that you should configure in DNS for the name of the clustered samba server and are the addresses that cifs clients will connect to. The CTDB cluster utilizes ip takeover techniques to ensure that as long as at least one node in the cluster is available, all the public ip addresses will always be available to clients.

CTDB nodes will only take over ipaddresses that are inside the same subnet as its own public ip address. In the example above, nodes 0 and 1 would be able to take over each others public ip and analog for nodes 2 and 3, but node 0 and 1 would NOT be able to take over the ip addresses for nodes 2 or 3 since they are on a different subnet.

Do not assign these addresses to any of the interfaces on the host. CTDB will add and remove these addresses automatically at runtime.

/etc/ctdb/events

This is a script that is called out to by CTDB when certain events occur to allow for site specific tasks to be performed. The events currently implemented and called out for are

1, when the node takes over an ip address
2, when the node releases an ip address
3, when recovery has completed and the cluster is reconfigured
4, when the cluster performs a clean shutdown

/etc/services

CTDB defaults to use TCP port 9001 for its traffic. Configuring a different port to use for CTDB traffic is done by adding a ctdb entry to the /etc/services file.

Example: for change CTDB to use port 9999 add the following line to /etc/services

ctdb  9999/tcp

Note: all nodes in the cluster MUST use the same port or else CTDB will not start correctly.

Starting the cluster

There is an example startup script in samba_3_0_ctdb/examples/ctdb/cluster_start.sh. This script will read your cluster_nodes.txt and create smb.conf files for each node, and start smbd and ctdbd on each node of the cluster.

Loopback Setup

For testing purposes you can setup a Samba/CTDB cluster on a single computer using loopback networking. To set this up you need to do this:

- use ifconfig to create IP aliases for your loopback device for each virtual node
- put the list of aliased IP addresses in cluster_nodes.txt

For example in order to create loopback devices 2 through 4 (lookpback device 1 already exists on most systems), you could do this:

 for i in `seq 2 4`; do
   ifconfig lo:$i 127.0.0.$i
 done

then to configure these you would create a cluster_nodes.txt with the lines:

 127.0.0.1
 127.0.0.2
 127.0.0.3
 127.0.0.4

Then start the cluster as above. For the system to start you also need an onnode script in your path. For this simple example of running a simulated cluster on a single computer the onnode.loop example script can be renamed to onnode in order to create the necessary script. The user rarely needs to directly invoke this script but it is used by the cluster startup script to remotely execute commands on other cluster nodes. There is a second example onnode script, onnode.ssh, which is not needed for this example (but which could be renamed to onnode, instead of using onnode.local, when using a multi-computer cluster). The last line of onnode.ssh, which contains the sample command for starting ssh could be changed (e.g. for certain Kerberized ssh configurations) when the cluster is run over multiple computers.

Testing your cluster

Once your cluster is up and running, you may wish to know how to test that it is functioning correctly. The following tests may help with that

Using ctdb

The ctdb package comes with a utility called ctdb that can be used to view the behaviour of the ctdb cluster. If you run it with no options it will provide some terse usage information. The most commonly used commands are:

- ctdb ping
- ctdb -n all status

Using smbcontrol

You can check for connectivity to the smbd daemons on each node using smbcontrol

- smbcontrol smbd ping

Using Samba4 smbtorture

The Samba4 version of smbtorture has several tests that can be used to benchmark a CIFS cluster. You can download Samba4 like this:

 svn co svn://svnanon.samba.org/samba/branches/SAMBA_4_0

Then configure and compile it as usual. The particular tests that are helpful for cluster benchmarking are the RAW-BENCH-OPEN, RAW-BENCH-LOCK and BENCH-NBENCH tests. These tests take a unclist that allows you to spread the workload out over more than one node. For example:

 smbtorture //localhost/data -Uuser%password  RAW-BENCH-LOCK --unclist=unclist.txt --num-progs=32 -t60

A suitable unclist.txt is generated in your $PREFIX/lib directory when you run cluster_start.sh

For NBENCH testing you need a client.txt file. A suitable file can be found in the dbench distribution at http://samba.org/ftp/tridge/dbench/

Setting up CTDB for clustered NFS

Configure CTDB as above and set it up to use public ipaddresses. Verify that the CTDB cluster works.

sm-notify

Make sure you have the sm-notify tool installed as /usr/sbin. This tool is required so that CTDB will be able to successfully trigger lock recovery after an ip address failover/failback.

/etc/exports

Export the same directory from all nodes. Also make sure to specify the fsid export option so that all nodes will present the same fsid to clients. clients can get "upset" if the fsid on a mount suddenly changes.

 /gpfs0/data *(rw,fsid=1235)

/etc/sysconfig/nfs

This file must be edited to point statd to keep its state directory on shared storage instead of in a local directory. We must also make statd use a fixed port to listen on that is the same for all nodes in the cluster. If we dont specify a fixed port, the statd port will change during failover which causes problems on some clients.

This file should look something like :

 STATD_SHARED_DIRECTORY=/gpfs0/nfs-state
 STATD_HOSTNAME="ctdb -P $STATD_SHARED_DIRECTORY/192.168.1.1 -H /etc/ctdb/statd-callout -p 97"

statd state directories

For each node, create a state directory on shared storage where each local statd daemon can keep its state information. This needs to be on shared storage since if a node takes over an ip address it needs to find the list of monitored clients to notify. If you have four nodes with the public addresses listed above, this means the following directories needs to be created on shared storage:

 mkdir /gpfs0/nfs-state
 mkdir /gpfs0/nfs-state/192.168.1.1
 mkdir /gpfs0/nfs-state/192.168.1.2
 mkdir /gpfs0/nfs-state/192.168.2.1
 mkdir /gpfs0/nfs-state/192.168.2.2

IMPORTANT

Never ever mount the same nfs share on a client from two different nodes in the cluster at the same time. The client side caching in NFS is very fragile and assumes/relies on that a single object can only be seen through one since mount at a time.