Setting up CTDB for Clustered NFS: Difference between revisions

From SambaWiki
(Add section about Using CTDB with other NFS servers)
(Add Troubleshooting section)
Line 124: Line 124:


These files are described by [https://git.samba.org/?p=samba.git;a=blob;f=ctdb/config/nfs-checks.d/README the relevant README file]. See the files shipped with CTDB for examples.
These files are described by [https://git.samba.org/?p=samba.git;a=blob;f=ctdb/config/nfs-checks.d/README the relevant README file]. See the files shipped with CTDB for examples.

= Troubleshooting =

== File handle consistency ==

If testing shows stale file handles or other unexpected issues during fail-over testing then this may be due to a cluster filesystem providing inconsistent device numbers across the nodes of the cluster for an exported filesystem.

NFS implementations often use device numbers when constructing file handles. If file handles are constructed inconsistently across the cluster then this can rest in stale file handles. In such cases you should test [[Setting_up_a_cluster_filesystem#Checking_uniformity_of_device_and_inode_numbering|device and inode number uniformity]] of your cluster filesystem. If devices numbers are inconsistent then it may or may not be possible to configure the NFS implementation to construct file handles using some other algorithm.

[https://github.com/sahlberg/libnfs libnfs] includes an example program called <code>nfs-fh</code> that can be used to check that file handles are constructed consistently across cluster nodes.

# onnode -c all ./examples/nfs-fh nfs://127.0.0.1/testfile
>> NODE: 10.1.1.1 <<
43000be210dd17000000010000feffffffffffffff000000
>> NODE: 10.1.1.2 <<
43000be210dd17000000010000feffffffffffffff000000

In this case the file handles are consistent.

Revision as of 05:28, 8 January 2018

Assumptions

This guide is aimed at the Linux kernel NFS daemon.

CTDB can be made to manage another NFS server by using the CTDB_NFS_CALLOUT configuration variable to specify an NFS server-specific call-out.

Prerequisites

NFS configuration

Exports

Requirements:

  • NFS exports must be the same on all nodes
  • For each export, the fsid option must be set to the same value on all nodes.

For the Linux kernel NFS server, this is usually in /etc/exports.

Example:

 /clusterfs0/data *(rw,fsid=1235)
 /clusterfs0/misc *(rw,fsid=1237)

Daemon configuration

Clustering NFS has some extra requirements compared to running a regular NFS server, so some extra configuration is needed.

  • All NFS daemons should run on fixed ports, which should be the same on all cluster nodes. Some clients can become confused if ports change during fail-over.
  • NFSv4 should be disabled.
  • statd should be configured to use CTDB's high-availability call-out.
  • statd's hostname should be resolvable into the CTDB public IP addresses. It should be the same name used by Samba. This should use the value of NFS_HOSTNAME, since this is used by CTDB's high-availability call-out.

Red Hat Linux variants

The configuration file will be /etc/sysconfig/nfs and it should look something like:

 NFS_HOSTNAME="ctdb"
 RPCNFSDARGS="-N 4"
 RPCNFSDCOUNT=32
 STATD_PORT=595
 STATD_OUTGOING_PORT=596
 MOUNTD_PORT=597
 RQUOTAD_PORT=598
 LOCKD_UDPPORT=599
 LOCKD_TCPPORT=599
 STATD_HOSTNAME="$NFS_HOSTNAME"
 STATD_HA_CALLOUT="/etc/ctdb/statd-callout"

Configure CTDB to manage NFS

In the CTDB configuration, tell CTDB that you want it to manage NFS:

 CTDB_MANAGES_NFS=yes

CTDB will manage and start/stop/restart the NFS services, so the operating system should be configured so these are not started/stopped automatically.

Red Hat Linux variants

If using a Red Hat Linux variant, the NFS services are nfs and nfslock services. Starting them at boot time is not recommended and this can be disabled using chkconfig.

 chkconfig nfs off
 chkconfig nfslock off

The service names and mechanism for disabling them varies across operating systems.

Client configuration

IP addresses, rather than a DNS/host name, should be used when configuring client mounts. NFSv3 locking is heavily tied to IP addresses and can break if a client uses round-robin DNS. This means load balancing for NFS is achieved by hand-distributing public IP addresses across clients.

IMPORTANT

Never mount the same NFS share on a client from two different nodes in the cluster at the same time. The client-side caching in NFS is very fragile and assumes that an object can only be accessed through one single path at a time.

Event scripts

CTDB clustering for NFS relies on two event scripts 06.nfs and 60.nfs. These are provided as part of CTDB and do not usually need to be changed.

Using CTDB with other NFS servers

The NFS event scripts provide a generic framework for managing NFS from CTDB. These scripts also include infrastructure for flexible NFS RPC service monitoring. There are 2 configuration variables that may need to be changed when using an NFS server other than the default (Linux kernel NFS server).

CTDB_NFS_CALLOUT

This variable is the absolute pathname of the desired NFS call-out used by CTDB's

If CTDB_NFS_CALLOUT is unset or null then CTDB will use the provided nfs-linux-kernel-callout.

An example nfs-ganesha-callout is provided as an example as part of CTDB's documentation. This call-out has not been as extensively tested as nfs-linux-kernel-callout.

Writing a call-out

A call-out should implement any required methods. Available methods are:

startup, shutdown
Startup or shutdown the entire NFS service.
start, stop
Start or stop a subset of services, as referenced from NFS checks (see below).
releaseip-pre, releaseip, takeip-pre, takeip
Take actions before or after an IP address is released or taken over during IP failover.
monitor-list-shares
List exported directories that should be monitored for existence. This can be used to ensure that cluster filesystems are mounted.
monitor-pre, monitor-post
Additional monitoring before or after the standard monitoring of RPC services (see below).
register
Should list the names of all implemented methods. This is an optimisation that stops the event scripts from calling unimplemented methods in the call-out.

See the existing call-outs for implementation details and suggested style.

CTDB_NFS_CHECKS_DIR

This is the absolute pathname of a directory of files that describe how to monitor desired NFS RPC services. It can also be configured to try to restart services if they remain unresponsive.

If CTDB_NFS_CHECKS_DIR is unset or null then CTDB uses a set of NFS RPC checks in the nfs-checks.d subdirectory of the CTDB configuration directory.

When providing a different set of NFS RPC checks, create a new subdirectory, such as nfs-checks-enabled.d or nfs-checks-ganesha.d, and set CTDB_NFS_CHECKS_DIR to point to this directory. Populate the directory with custom check files and/or symbolic links to desired checks in the nfs-checks.d. This method is upgrade-safe - if you remove certain checks then they will not be replaced when you upgrade CTDB to a newer version.

Writing NFS RPC check files

These files are described by the relevant README file. See the files shipped with CTDB for examples.

Troubleshooting

File handle consistency

If testing shows stale file handles or other unexpected issues during fail-over testing then this may be due to a cluster filesystem providing inconsistent device numbers across the nodes of the cluster for an exported filesystem.

NFS implementations often use device numbers when constructing file handles. If file handles are constructed inconsistently across the cluster then this can rest in stale file handles. In such cases you should test device and inode number uniformity of your cluster filesystem. If devices numbers are inconsistent then it may or may not be possible to configure the NFS implementation to construct file handles using some other algorithm.

libnfs includes an example program called nfs-fh that can be used to check that file handles are constructed consistently across cluster nodes.

# onnode -c all ./examples/nfs-fh nfs://127.0.0.1/testfile

>> NODE: 10.1.1.1 <<
43000be210dd17000000010000feffffffffffffff000000

>> NODE: 10.1.1.2 <<
43000be210dd17000000010000feffffffffffffff000000

In this case the file handles are consistent.