Setting up CTDB for Clustered NFS

From SambaWiki
Revision as of 04:07, 21 October 2016 by MartinSchwenke (talk | contribs) (MartinSchwenke moved page Setting Up CTDB For Clustered NFS to Setting up CTDB for Clustered NFS: Title looks better...)

First steps

[CTDB_Setup|Configure CTDB] and set it up to use public Ip addresses. Verify that the CTDB cluster works.

/etc/exports

Export the same directory from all nodes. Also make sure to specify the fsid export option so that all nodes will present the same fsid to clients. clients can get "upset" if the fsid on a mount suddenly changes.

 /gpfs0/data *(rw,fsid=1235)

/etc/sysconfig/nfs

This file must be edited to point statd to keep its state directory on shared storage instead of in a local directory. We must also make statd use a fixed port to listen on that is the same for all nodes in the cluster. If we don't specify a fixed port, the statd port will change during failover which causes problems on some clients.

This file should look something like :

 CTDB_MANAGES_NFS=yes
 NFS_TICKLE_SHARED_DIRECTORY=/gpfs0/nfs-tickles
 STATD_PORT=595
 STATD_OUTGOING_PORT=596
 MOUNTD_PORT=597
 RQUOTAD_PORT=598
 LOCKD_UDPPORT=599
 LOCKD_TCPPORT=599
 STATD_SHARED_DIRECTORY=/gpfs0/nfs-state
 NFS_HOSTNAME="ctdb"
 STATD_HOSTNAME="$NFS_HOSTNAME -P "$STATD_SHARED_DIRECTORY/$PUBLIC_IP" -H /etc/ctdb/statd-callout -p 97"
 RPCNFSDARGS="-N 4"

The CTDB_MANAGES_NFS line tells the events scripts that CTDB is to manage startup and shutdown of the NFS and NFSLOCK services. With this set to yes, CTDB will start/stop/restart these services as required.

STATD_SHARED_DIRECTORY is the shared directory where statd and the statd-callout script expects that the state variables and lists of clients to notify are found.

The ip address specified should be the public address of this node.

The reason to specify the port used by the lockmanager is so that the port used by a public address will not change during address failover/failback since this can confuse some clients.

NFS_TICKLE_SHARED_DIRECTORY is where ctdb will store information about which clients have established tcp connections to the cluster. This information is used during failover of ip addresses. This allows the node that takes over an ip address to very quickly 'tickle' and reset any tcp connections for the ip address it took over. The reason to do this is to improve the speed at which a client will detect that the tcp connection for NFS needs to be reestablished and to speed up recovery in the client.

NFS_HOSTNAME is the name that the nfs server will use for the public addresses. This should be the same as the name samba uses. This name must be resolvable into the ip addresses used for public addresses.

The RPCNFSDARGS line is used to disable support for NFSv4 which is not yet supported by CTDB.

chkconfig

Since CTDB will manage and start/stop/restart the nfs and the nfslock services, you must disable them in chkconfig.

 chkconfig nfs off
 chkconfig nfslock off

Statd state directories

For each node, create a state directory on shared storage where each local statd daemon can keep its state information. This needs to be on shared storage since if a node takes over an ip address it needs to find the list of monitored clients to notify. You need to create the directory used to host this statd state on shared storage.

 mkdir /gpfs0/nfs-state

Event scripts

CTDB clustering for NFS relies on two event scripts /etc/ctdb/events.d/60.nfs and /etc/ctdb/events.d/61.nfstickle. These two scripts are provided by the RPM package and there should not be any need to change them.

IMPORTANT

Never ever mount the same nfs share on a client from two different nodes in the cluster at the same time. The client side caching in NFS is very fragile and assumes/relies on that an object can only be accessed through one single path at a time.