Setting up CTDB for Clustered NFS
Assumptions
This guide is aimed at the Linux kernel NFS daemon.
CTDB can be made to manage another NFS server by using the CTDB_NFS_CALLOUT
script option to specify an NFS server-specific call-out.
Prerequisites
- Basic CTDB configuration
- Setting up a cluster filesystem
- Configuring the CTDB recovery lock (recommended)
- Adding public IP addresses (or some other failover/load balancing scheme)
NFS configuration
Exports
Requirements:
- NFS exports must be the same on all nodes
- For each export, the
fsid
option must be set to the same value on all nodes.
For the Linux kernel NFS server, this is usually in /etc/exports
.
Example:
/clusterfs0/data *(rw,fsid=1235) /clusterfs0/misc *(rw,fsid=1237)
Daemon configuration
Clustering NFS has some extra requirements compared to running a regular NFS server, so some extra configuration is needed.
- All NFS daemons should run on fixed ports, which should be the same on all cluster nodes. Some clients can become confused if ports change during fail-over.
- NFSv4 should be disabled.
statd
should be configured to use CTDB's high-availability call-out.- The
NFS_HOSTNAME
variable must be set in the NFS system configuration. This configuration is loaded by CTDB's high-availability call-out, which usesNFS_HOSTNAME
.NFS_HOSTNAME
should be resolvable into the CTDB public IP addresses that are used by NFS clients. statd
's hostname (passed via the-H
option) must use the value ofNFS_HOSTNAME
.
Red Hat Linux variants
The configuration file will be /etc/sysconfig/nfs
and it should look something like:
NFS_HOSTNAME="cluster1" STATD_PORT=32765 STATD_OUTGOING_PORT=32766 MOUNTD_PORT=32767 RQUOTAD_PORT=32768 LOCKD_UDPPORT=32769 LOCKD_TCPPORT=32769 STATDARG="-n ${NFS_HOSTNAME}" STATD_HA_CALLOUT="/etc/ctdb/statd-callout" RPCNFSDARGS="-N 4" RPCNFSDCOUNT=8
This should work with both systemd
and Sys-V init variants.
When using systemd
then /etc/sysconfig/rpc-rquotad
should also contain:
RPCRQUOTADOPTS="-p 32768"
Debian GNU/Linux variants
The following configuration files should work for both systemd
and Sys-V init:
/etc/default/nfs-kernel-server
:
RPCNFSDOPTS="-N 4" RPCNFSDCOUNT=8 RPCMOUNTDOPTS="-p 32767"
/etc/default/nfs-common
:
NFS_HOSTNAME="cluster1" STATDOPTS="-n ${NFS_HOSTNAME} -p 32765 -o 32766 -H /etc/ctdb/statd-callout -T 32769 -U 32769"
/etc/default/quota
:
RPCRQUOTADOPTS="-p 32768"
Unfortunately, RPCNFSDOPTS
isn't used by Debian Sys-V init, so there is no way to disable NFSv4 via the configuration file.
Configure CTDB to manage NFS
The NFS event scripts must be enabled:
ctdb event script enable legacy 60.nfs ctdb event script enable legacy 06.nfs
CTDB will manage and start/stop/restart the NFS services, so the operating system should be configured so these are not started/stopped automatically.
Samba ≤ 4.8
In the CTDB configuration, tell CTDB that you want it to manage NFS:
CTDB_MANAGES_NFS=yes
The event script must also be enabled:
ctdb event script enable 60.nfs ctdb event script enable 06.nfs
Red Hat Linux variants
If using a Red Hat Linux variant, the NFS services are nfs
and nfslock
services. Starting them at boot time is not recommended and this can be disabled using chkconfig
.
chkconfig nfs off chkconfig nfslock off
The service names and mechanism for disabling them varies across operating systems.
Client configuration
IP addresses, rather than a DNS/host name, should be used when configuring client mounts. NFSv3 locking is heavily tied to IP addresses and can break if a client uses round-robin DNS. This means load balancing for NFS is achieved by hand-distributing public IP addresses across clients.
IMPORTANT
Never mount the same NFS share on a client from two different nodes in the cluster at the same time. The client-side caching in NFS is very fragile and assumes that an object can only be accessed through one single path at a time.
Event scripts
CTDB clustering for NFS relies on two event scripts 06.nfs
and 60.nfs
. These are provided as part of CTDB and do not usually need to be changed.
Using CTDB with other NFS servers
The NFS event scripts provide a generic framework for managing NFS from CTDB. These scripts also include infrastructure for flexible NFS RPC service monitoring. There are 2 configuration variables that may need to be changed when using an NFS server other than the default (Linux kernel NFS server).
CTDB_NFS_CALLOUT
This variable is the absolute pathname of the desired NFS call-out used by CTDB's
If CTDB_NFS_CALLOUT
is unset or null then CTDB will use the provided nfs-linux-kernel-callout
.
An example nfs-ganesha-callout
is provided as an example as part of CTDB's documentation. This call-out has not been as extensively tested as nfs-linux-kernel-callout
.
Writing a call-out
A call-out should implement any required methods. Available methods are:
- startup, shutdown
- Startup or shutdown the entire NFS service.
- start, stop
- Start or stop a subset of services, as referenced from NFS checks (see below).
- releaseip-pre, releaseip, takeip-pre, takeip
- Take actions before or after an IP address is released or taken over during IP failover.
- monitor-list-shares
- List exported directories that should be monitored for existence. This can be used to ensure that cluster filesystems are mounted.
- monitor-pre, monitor-post
- Additional monitoring before or after the standard monitoring of RPC services (see below).
- register
- Should list the names of all implemented methods. This is an optimisation that stops the event scripts from calling unimplemented methods in the call-out.
See the existing call-outs for implementation details and suggested style.
CTDB_NFS_CHECKS_DIR
This is the absolute pathname of a directory of files that describe how to monitor desired NFS RPC services. It can also be configured to try to restart services if they remain unresponsive.
If CTDB_NFS_CHECKS_DIR
is unset or null then CTDB uses a set of NFS RPC checks in the nfs-checks.d
subdirectory of the CTDB configuration directory.
When providing a different set of NFS RPC checks, create a new subdirectory, such as nfs-checks-enabled.d
or nfs-checks-ganesha.d
, and set CTDB_NFS_CHECKS_DIR
to point to this directory. Populate the directory with custom check files and/or symbolic links to desired checks in the nfs-checks.d
. This method is upgrade-safe - if you remove certain checks then they will not be replaced when you upgrade CTDB to a newer version.
Writing NFS RPC check files
These files are described by the relevant README file. See the files shipped with CTDB for examples.
Troubleshooting
File handle consistency
If testing shows stale file handles or other unexpected issues during fail-over testing then this may be due to a cluster filesystem providing inconsistent device numbers across the nodes of the cluster for an exported filesystem.
NFS implementations often use device numbers when constructing file handles. If file handles are constructed inconsistently across the cluster then this can rest in stale file handles. In such cases you should test device and inode number uniformity of your cluster filesystem. If devices numbers are inconsistent then it may or may not be possible to configure the NFS implementation to construct file handles using some other algorithm.
libnfs includes an example program called nfs-fh
that can be used to check that file handles are constructed consistently across cluster nodes.
# onnode -c all ./examples/nfs-fh nfs://127.0.0.1/testfile >> NODE: 10.1.1.1 << 43000be210dd17000000010000feffffffffffffff000000 >> NODE: 10.1.1.2 << 43000be210dd17000000010000feffffffffffffff000000
In this case the file handles are consistent.