CTDB's distributed volatile databases are subject to contention for database records. This can result in performance issues. Contention is most often seen in
locking.tdb (and sometimes
brlock.tdb). Records in these databases are directly associated with files, so when several nodes contend for access to metadata for a particular file or directory, the associated record(s) are contended and bounce between nodes.
In this situation it is important to understand that CTDB is only involved in creating records and moving them between nodes.
smbd looks for a record in the desired TDB and if it determines that the latest version of that record is present on the current node then it uses that record. There are 2 other cases:
- The record is present but the current node does not have the latest copy
- The record is not present
In both cases
smbd will ask
ctdbd to fetch the record. So, when multiple nodes need to access the same record then that record will bounce. This can be expensive.
Log messages indicating poor performance
Log messages like the following are an indicator of performance problems:
db_ctdb_fetch_locked for /var/cache/dbdir/volatile/locking.tdb.N key ABCDEFBC2A66F9AD1C55142C290000000000000000000000, chain 62588 needed 1 attempts, X milliseconds, chainlock: Y ms, CTDB Z ms
Z is large (multiple seconds, particularly tens of seconds) then CTDB took a long time to fetch the record from another node.
Aside: stuck smbd processes
If Z is even larger (hundreds or thousands of seconds) then this can indicate that an
smbd process on a node is stuck in
D state, probably in a cluster filesystem system call, while holding a TDB lock. In this case the above
db_ctdb_fetch_locked messages may not even be seen because a record is never successfully fetched. In this case, one or more repeated message like the following may be seen:
Unable to get RECORD lock on database locking.tdb for X seconds
A very large value of
X (hundreds or thousands of seconds) indicates a serious problem.
This can be confirmed by finding a long-running
smbd process in
D state and obtaining a kernel stack trace (on Linux,
/proc/<pid>/stack). See the documentation for the ctdb.conf(5)
[database] lock debug script option for an automated way of debugging this (when robust mutexes are in use, which is the modern Samba default, this automated method only works on versions >= 4.15).
As hinted at above, the usual reason for this type of problem is a cluster filesystem issue.
The hot keys section of
ctdb dbstatistics locking.tdb statistics output lists the keys in
locking.tdb that have been fetched to a node the most times. Substitute other database names as appropriate.
Deliberately breaking lock coherency
Discussion of deliberately but carefully breaking lock coherency using:
fileid:algorithm = fsname_norootdir
fileid:algorithm = fsname_nodirs
See vfs_fileid(8). This needs to be carefully considered and understood to avoid filesystem corruption.