Setting up pCIFS using Samba and CTDB

As of April 2007 you can setup a simple Samba3 or Samba4 CTDB cluster, running either on loopback (with simulated nodes) or on a real cluster with TCP. This page will tell you how to get started.

Clustering Model

The setup instructions on this page are modelled on setting up a cluster of N nodes that function in nearly all respects as a single multi-homed node. So the cluster will export N IP interfaces, each of which is equivalent (same shares) and which offers coherent CIFS file access across all nodes.

The clustering model utilizes IP takeover techniques to ensure that the full set of public IP addresses assigned to services on the cluster will always be available to the clients even when some nodes have failed and become unavailable.

Samba Configuration

Next you need to initialise the Samba password database, e.g.

 smbpasswd -a root

Samba with clustering must use the tdbsam or ldap SAM passdb backends (it must not use the default smbpasswd backend), or must be configured to be a member of a domain. The rest of the configuration of Samba is exactly as it is done on a normal system. See the docs on http://samba.org/ for details.

Critical smb.conf parameters

A clustered Samba install must set some specific configuration parameters

netbios name = something *
clustering = yes
idmap config * : backend = autorid
idmap config * : range = 1000000-1999999

NB:

See idmap(8) for more information about the idmap configuration
netbios name should be the same on all node

If using the Samba registry then these must be set in smb.conf:

ctdb:registry.tdb=yes
include=registry

CTDB Cluster Configuration

These are the primary configuration files for CTDB. When CTDB is installed, it will install template versions of these files which you need to edit to suit your system. The current set of config files for CTDB are also available in the /usr/src/ctdb/config directory.

CTDB configuration file

The preferred file for CTDB configuration is /etc/ctdb/ctdbd.conf. Linux distribution-specific configuration files such as /etc/sysconfig/ctdb or /etc/default/ctdb are also supported. Depending on how you install CTDB, a template configuration file may be installed.

The most important options are:

* CTDB_NODES
* CTDB_RECOVERY_LOCK
* CTDB_PUBLIC_ADDRESSES
* CTDB_MANAGES_SAMBA

Please see ctdbd.conf(5) for more details.

Recovery lock

The recovery lock, configured via CTDB_RECOVERY_LOCK provides important split-brain prevention and is usually configured to point to a locl file in the cluster filesystem. See the RECOVERY LOCK section in ctdb(7) for more details.

/etc/ctdb/nodes

By default CTDB_NODES points to /etc/ctdb/nodes and contains a list of the private IP addresses that the CTDB daemons will use in your cluster. This should be a private non-routeable subnet which is only used for internal cluster traffic. This file must be the same on all nodes in the cluster.

Example :

10.1.1.1
10.1.1.2
10.1.1.3
10.1.1.4

/etc/ctdb/public_addresses

This file is only required if you plan to use IP takeover. The CTDB_PUBLIC_ADDRESSES configuration variable must be set to point to this file, otherwise it will be ignored. The file contains a list of public IP addresses, one per line, each with an optional (comma-separated) list of network interfaces that can have that address assigned. These are the addresses that the SMBD daemons will bind to.

Example:

192.168.1.1/24 eth1
192.168.1.2/24 eth1
192.168.2.1/24 eth2
192.168.2.2/24 eth2

If network interfaces are not specified on all lines in the public addresses file then the CTDB_PUBLIC_INTERFACE configuration variable must be used to specify a default interface.

These are the IP addresses that you should configure in DNS for the name of the clustered samba server and are the addresses that CIFS clients will connect to. The CTDB cluster utilizes IP takeover techniques to ensure that as long as at least one node in the cluster is available, all the public IP addresses will always be available to clients.

Do not manually assign these addresses to any of the interfaces on the host. CTDB will add and remove these addresses automatically at runtime.

There is no built-in restriction on the number of IP addresses or network interfaces that can be used. However, performance limitations (e.g. time taken to calculate IP address distribution, time taken to break TCP connections and delete IPs from interfaces, ...) introduce practical limits. With a small number of nodes it is sensible to plan the IP addresses so that they can be evenly redistributed across subsets of nodes. For example, a 4 node cluster will always be able to evenly distribute 12 public IP addresses (across 4, 3, 2, 1 nodes). Having IP addresses evenly balanced is not a hard requirement but evenly balancing IP addresses is the only method of load balancing used by CTDB.

The public addresses file can differ between nodes, allowing subsets of nodes to host particular public IP addresses. Note that pathological configurations can result in undesirable IP address distribution.

/etc/ctdb/events.d

This directory contains event scripts that are called out to by CTDB when certain events occur. Event scripts support health monitoring, service management, IP failover, internal CTDB operations and features. They handle events such as startup, shutdown, monitor, releaseip and takeip.

Please see the service scripts that installed by ctdb in /etc/ctdb/events.d for examples of how to configure other services to be aware of the HA features of CTDB.

Also see /etc/ctdb/events.d/README for additional documentation on how to write and modify event scripts.

/etc/services

CTDB defaults to use IANA assigned TCP port 4379 for its traffic. Configuring a different port to use for CTDB traffic is done by adding a ctdb entry to the /etc/services file.

Example: for change CTDB to use port 9999 add the following line to /etc/services

ctdb  9999/tcp

Note: all nodes in the cluster MUST use the same port or else CTDB will not start correctly.

Name resolution

You need to setup some method for your Windows and NFS clients to find the nodes of the cluster, and automatically balance the load between the nodes. We recommend that you setup a round-robin DNS entry for your cluster, listing all the public IP addresses that CTDB will be managing as a single DNS A record.

You may also wish to setup a static WINS server entry listing all of your cluster nodes IP addresses.

Managing Network Interfaces

The default install of CTDB is able to add/remove IP addresses from your network interfaces using the CTDB_PUBLIC_ADDRESSS option shown above.

For more sophisticated interface management you will need to add a new events script in /etc/ctdb/events.d/.

For example, say you wanted CTDB to add a default route when it brings it up. You could have an event script called /etc/ctdb/events.d/11.route that looks like this:

#!/bin/sh

. /etc/ctdb/functions
loadconfig ctdb

cmd="$1"
shift

case $cmd in
    takeip)
         # we ignore errors from this, as the route might be up already when we're grabbing
         # a 2nd IP on this interface
         /sbin/ip route add $CTDB_PUBLIC_NETWORK via $CTDB_PUBLIC_GATEWAY dev $1 2> /dev/null
         ;;
esac

exit 0

Then you would put CTDB_PUBLIC_NETWORK and CTDB_PUBLIC_GATEWAY in /etc/sysconfig/ctdb like this:

CTDB_PUBLIC_NETWORK="10.1.2.0/24"
CTDB_PUBLIC_GATEWAY="10.1.2.1"

Filesystem specific configuration

The cluster filesystem you use with ctdb plays a critical role in ensuring that CTDB works seamlessly. Here are some filesystem specific tips

If you are interested in testing a new cluster filesystem with CTDB then we strongly recommend looking at the page on testing filesystems using ping_pong to ensure that the cluster filesystem supports correct POSIX locking semantics.

IBMs GPFS filesystem

The GPFS filesystem (see http://www-03.ibm.com/systems/clusters/software/gpfs.html) is a proprietary cluster filesystem that has been extensively tested with CTDB/Samba. When using GPFS, the following smb.conf settings are recommended

clustering = yes
idmap backend = tdb2
fileid:mapping = fsname
vfs objects = gpfs fileid
gpfs:sharemodes = No
force unknown acl user = yes
nfs4: mode = special
nfs4: chown = yes
nfs4: acedup = merge

The ACL related options should only be enabled if you have NFSv4 ACLs enabled on your filesystem

The most important of these options is the "fileid:mapping". You risk data corruption if you use a different mapping backend with Samba and GPFS, because locking wilk break across nodes. NOTE: You must also load "fileid" as a vfs object in order for this to take effect.

A guide to configuring Samba with CTDB and GPFS can be found at Samba CTDB GPFS Cluster HowTo

RedHat GFS filesystem

Red Hat GFS is a native file system that interfaces directly with the Linux kernel file system interface (VFS layer).

The gfs_controld daemon manages mounting, unmounting, recovery and posix locks. Edit /etc/init.d/cman (If using RedHat Cluster Suite) to start gfs_controld with the '-l 0 -o 1' flags to optimize posix locking performance. You'll notice the difference this makes by running the ping_pong test with and without these options.

A complete HowTo document to setup clustered samba with CTDB and GFS2 is here: GFS CTDB HowTo

Lustre filesystem

Lustre® is a scalable, secure, robust, highly-available cluster file system. It is designed, developed and maintained by a number of companies ( Intel, Seagate ) and OpenSFS which is a not for profit organisation.

Tests have been done on Lustre releases of 1.4.x and 1.6.x with CTDB/Samba, The current lustre release is 2.5.2 . When mounting Lustre, an option of "-o flock" should be specified to enable cluster-wide byte range lock among all Lustre clients.

These two versions have differnt mechanisms of configuration and startup. More information is available at http://wiki.lustre.org.

In comparison of Lustre configurating, setting up CTDB/Samba on the two different versions keeps the same way. The following settings are recommended:

clustering = yes
idmap backend = tdb2
fileid:mapping = fsname
use mmap = no
nt acl support = yes
ea support = yes

The options of "fileid:mapping" and "use mmap" must be specified to avoid possibe data corruption. The sixth of "nt acl support" is to map the POSIX ACL to Windows NT's format. At the moment, Lustre only supports POSIX ACL.

GlusterFS filesystem

GlusterFS is a cluster file-system capable of scaling to several peta-bytes that is easy to configure. It aggregates various storage bricks over Infiniband RDMA or TCP/IP interconnect into one large parallel network file system. GlusterFS is based on a stackable user space design without compromising performance. It uses Linux File System in Userspace (FUSE) to achieve all this.

NOTE: GlusterFS has not yet had extensive testing but this is currently underway.

Currently from versions 2.0 to 2.0.4 of GlusterFS, it must be patched with:

http://patches.gluster.com/patch/813/

This is to ensure GlusterFS passes the ping_pong test. This issue is being tracked at:

http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=159

Update: As of GlusterFS 2.0.6 this has been fixed.

OCFS2

OCFS2 - see http://oss.oracle.com/projects/ocfs2/

recommended settings:

fileid:mapping = fsid
vfs objects = fileid

OCFS2 1.4 offers cluster-wide byte-range locking.

Starting the cluster

Just start the ctdb service on all nodes. A sample init script (works for RedHat) is located in /usr/src/ctdb/config/ctdb.init

If you have taken advantage of the ability of CTDB to start other services, then you should disable those other services with chkconfig, or your systems service configuration tool. Those services will instead be started by ctdb using the /etc/ctdb/events.d service scripts.

If you wish to cope with software faults in ctdb, or want ctdb to automatically restart when an administration kills it, then you may wish to add a cron entry for root like this:

* * * * * /etc/init.d/ctdb cron > /dev/null 2>&1

Testing your cluster

Once your cluster is up and running, you may wish to know how to test that it is functioning correctly. The following tests may help with that

Using ctdb

The ctdb package comes with a utility called ctdb that can be used to view the behaviour of the ctdb cluster. If you run it with no options it will provide some terse usage information. Some commonly used commands are:

ctdb status
ctdb ping
ctdb ip

ctdb status

The ctdb status command provides basic information about the cluster and the status of the nodes. when you run it you will get some output like:

 Number of nodes:4
 vnn:0 10.1.1.1       OK (THIS NODE)
 vnn:1 10.1.1.2       OK
 vnn:2 10.1.1.3       OK
 vnn:3 10.1.1.4       OK
 Generation:1362079228
 Size:4
 hash:0 lmaster:0
 hash:1 lmaster:1
 hash:2 lmaster:2
 hash:3 lmaster:3
 Recovery mode:NORMAL (0)
 Recovery master:0

The important parts are in bold:

All 4 nodes are in a healthy state
Recovery mode is normal, which means that the cluster has finished a recovery and is running in a normal fully operational state

Recovery state will briefly change to RECOVERY when there ahs been a node failure or something is wrong with the cluster.

If the cluster remains in RECOVERY state for very long (many seconds) there might be a configuration problem. Check the logs for details.

ctdb ping

The ctdb ping command ensures the local CTDB daemon is running and shows how many clients are connected.

 # onnode -q all ctdb ping

 response from 0 time=0.000050 sec  (13 clients)
 response from 1 time=0.000154 sec  (27 clients)
 response from 2 time=0.000114 sec  (17 clients)
 response from 3 time=0.000115 sec  (59 clients)

ctdb ip

The ctdb ip shows the public IP addresses and which node is hosting them.

 Number of nodes:4
 192.168.1.1         0
 192.168.1.2         1
 192.168.2.1         2
 192.168.2.2         3

A value of -1 for a node number indicates that an address is not currently hosted.

Using smbcontrol

You can check for connectivity to the smbd daemons on each node using smbcontrol

- smbcontrol smbd ping

Using Samba4 smbtorture

The Samba4 version of smbtorture has several tests that can be used to benchmark a CIFS cluster. You can download Samba4 like this:

 git clone git://git.samba.org/samba.git
 cd samba/source4

Then configure and compile it as usual. The particular tests that are helpful for cluster benchmarking are the RAW-BENCH-OPEN, RAW-BENCH-LOCK and BENCH-NBENCH tests. These tests take a unclist that allows you to spread the workload out over more than one node. For example:

 smbtorture //localhost/data -Uuser%password  RAW-BENCH-LOCK --unclist=unclist.txt --num-progs=32 -t60

The file unclist.txt should contain a list of share in your cluster (UNC format: //server//share). For example

//node1/data
//node2/data
//node3/data
//node4/data

For NBENCH testing you need a client.txt file. A suitable file can be found in the dbench distribution at http://samba.org/ftp/tridge/dbench/

Anonymous

Search

Configuring clustered Samba

Namespaces

More

Page actions

Contents

Setting up pCIFS using Samba and CTDB

Clustering Model

Samba Configuration

Critical smb.conf parameters

CTDB Cluster Configuration

CTDB configuration file

Recovery lock

/etc/ctdb/nodes

/etc/ctdb/public_addresses

/etc/ctdb/events.d

/etc/services

Name resolution

Managing Network Interfaces

Filesystem specific configuration

IBMs GPFS filesystem

RedHat GFS filesystem

Lustre filesystem

GlusterFS filesystem

OCFS2

Starting the cluster

Testing your cluster

Using ctdb

ctdb status

ctdb ping

ctdb ip

Using smbcontrol

Using Samba4 smbtorture

Navigation

Navigation

Wiki tools

Wiki tools

Anonymous

Search

Configuring clustered Samba

Setting up pCIFS using Samba and CTDB

Clustering Model

Samba Configuration

Critical smb.conf parameters

CTDB Cluster Configuration

CTDB configuration file

Recovery lock

/etc/ctdb/nodes

/etc/ctdb/public_addresses

/etc/ctdb/events.d

/etc/services

Name resolution

Managing Network Interfaces

Filesystem specific configuration

IBMs GPFS filesystem

RedHat GFS filesystem

Lustre filesystem

GlusterFS filesystem

OCFS2

Starting the cluster

Testing your cluster

Using ctdb

ctdb status

ctdb ping

ctdb ip

Using smbcontrol

Using Samba4 smbtorture

Navigation

Wiki tools

Page tools