Setting up CTDB for Clustered NFS
- 1 Assumptions
- 2 Prerequisites
- 3 NFS configuration
- 4 Configure CTDB to manage NFS
- 5 Client configuration
- 6 Event scripts
- 7 Using CTDB with other NFS servers
- 8 Troubleshooting
This guide is aimed at the Linux kernel NFS daemon.
CTDB can be made to manage another NFS server by using the
CTDB_NFS_CALLOUT configuration variable to specify an NFS server-specific call-out.
- Basic CTDB configuration
- Setting up a cluster filesystem
- Configuring the CTDB recovery lock (recommended)
- Adding public IP addresses (or some other failover/load balancing scheme)
- NFS exports must be the same on all nodes
- For each export, the
fsidoption must be set to the same value on all nodes.
For the Linux kernel NFS server, this is usually in
/clusterfs0/data *(rw,fsid=1235) /clusterfs0/misc *(rw,fsid=1237)
Clustering NFS has some extra requirements compared to running a regular NFS server, so some extra configuration is needed.
- All NFS daemons should run on fixed ports, which should be the same on all cluster nodes. Some clients can become confused if ports change during fail-over.
- NFSv4 should be disabled.
statdshould be configured to use CTDB's high-availability call-out.
statd's hostname should be resolvable into the CTDB public IP addresses. It should be the same name used by Samba. This should use the value of NFS_HOSTNAME, since this is used by CTDB's high-availability call-out.
Red Hat Linux variants
The configuration file will be
/etc/sysconfig/nfs and it should look something like:
NFS_HOSTNAME="ctdb" RPCNFSDARGS="-N 4" RPCNFSDCOUNT=32 STATD_PORT=595 STATD_OUTGOING_PORT=596 MOUNTD_PORT=597 RQUOTAD_PORT=598 LOCKD_UDPPORT=599 LOCKD_TCPPORT=599 STATD_HOSTNAME="$NFS_HOSTNAME" STATD_HA_CALLOUT="/etc/ctdb/statd-callout"
Configure CTDB to manage NFS
In the CTDB configuration, tell CTDB that you want it to manage NFS:
CTDB will manage and start/stop/restart the NFS services, so the operating system should be configured so these are not started/stopped automatically.
Red Hat Linux variants
If using a Red Hat Linux variant, the NFS services are
nfslock services. Starting them at boot time is not recommended and this can be disabled using
chkconfig nfs off chkconfig nfslock off
The service names and mechanism for disabling them varies across operating systems.
IP addresses, rather than a DNS/host name, should be used when configuring client mounts. NFSv3 locking is heavily tied to IP addresses and can break if a client uses round-robin DNS. This means load balancing for NFS is achieved by hand-distributing public IP addresses across clients.
Never mount the same NFS share on a client from two different nodes in the cluster at the same time. The client-side caching in NFS is very fragile and assumes that an object can only be accessed through one single path at a time.
CTDB clustering for NFS relies on two event scripts
60.nfs. These are provided as part of CTDB and do not usually need to be changed.
Using CTDB with other NFS servers
The NFS event scripts provide a generic framework for managing NFS from CTDB. These scripts also include infrastructure for flexible NFS RPC service monitoring. There are 2 configuration variables that may need to be changed when using an NFS server other than the default (Linux kernel NFS server).
This variable is the absolute pathname of the desired NFS call-out used by CTDB's
CTDB_NFS_CALLOUT is unset or null then CTDB will use the provided
nfs-ganesha-callout is provided as an example as part of CTDB's documentation. This call-out has not been as extensively tested as
Writing a call-out
A call-out should implement any required methods. Available methods are:
- startup, shutdown
- Startup or shutdown the entire NFS service.
- start, stop
- Start or stop a subset of services, as referenced from NFS checks (see below).
- releaseip-pre, releaseip, takeip-pre, takeip
- Take actions before or after an IP address is released or taken over during IP failover.
- List exported directories that should be monitored for existence. This can be used to ensure that cluster filesystems are mounted.
- monitor-pre, monitor-post
- Additional monitoring before or after the standard monitoring of RPC services (see below).
- Should list the names of all implemented methods. This is an optimisation that stops the event scripts from calling unimplemented methods in the call-out.
See the existing call-outs for implementation details and suggested style.
This is the absolute pathname of a directory of files that describe how to monitor desired NFS RPC services. It can also be configured to try to restart services if they remain unresponsive.
CTDB_NFS_CHECKS_DIR is unset or null then CTDB uses a set of NFS RPC checks in the
nfs-checks.d subdirectory of the CTDB configuration directory.
When providing a different set of NFS RPC checks, create a new subdirectory, such as
nfs-checks-ganesha.d, and set
CTDB_NFS_CHECKS_DIR to point to this directory. Populate the directory with custom check files and/or symbolic links to desired checks in the
nfs-checks.d. This method is upgrade-safe - if you remove certain checks then they will not be replaced when you upgrade CTDB to a newer version.
Writing NFS RPC check files
These files are described by the relevant README file. See the files shipped with CTDB for examples.
File handle consistency
If testing shows stale file handles or other unexpected issues during fail-over testing then this may be due to a cluster filesystem providing inconsistent device numbers across the nodes of the cluster for an exported filesystem.
NFS implementations often use device numbers when constructing file handles. If file handles are constructed inconsistently across the cluster then this can rest in stale file handles. In such cases you should test device and inode number uniformity of your cluster filesystem. If devices numbers are inconsistent then it may or may not be possible to configure the NFS implementation to construct file handles using some other algorithm.
libnfs includes an example program called
nfs-fh that can be used to check that file handles are constructed consistently across cluster nodes.
# onnode -c all ./examples/nfs-fh nfs://127.0.0.1/testfile >> NODE: 10.1.1.1 << 43000be210dd17000000010000feffffffffffffff000000 >> NODE: 10.1.1.2 << 43000be210dd17000000010000feffffffffffffff000000
In this case the file handles are consistent.