CTDB Performance: Difference between revisions

From SambaWiki
(→‎Hot records: Rename section for consistency with command output, explain a bit...)
Line 18: Line 18:
If <code>Z</code> is large (multiple seconds, particularly tens of seconds) then CTDB took a long time to fetch the record from another node.
If <code>Z</code> is large (multiple seconds, particularly tens of seconds) then CTDB took a long time to fetch the record from another node.


=== Aside: Stuck smbd processes===
=== Aside: stuck smbd processes===


If Z is even larger (hundreds or thousands of seconds) then this can indicate that an <code>smbd</code> process on a node is stuck in <code>D</code> state, probably in a cluster filesystem system call, while holding a TDB lock. In this case the above <code>db_ctdb_fetch_locked</code> messages may not even be seen because a record is never successfully fetched. In this case, one or more repeated message like the following may be seen:
If Z is even larger (hundreds or thousands of seconds) then this can indicate that an <code>smbd</code> process on a node is stuck in <code>D</code> state, probably in a cluster filesystem system call, while holding a TDB lock. In this case the above <code>db_ctdb_fetch_locked</code> messages may not even be seen because a record is never successfully fetched. In this case, one or more repeated message like the following may be seen:
Line 26: Line 26:
A very large value of <code>X</code> (hundreds or thousands of seconds) indicates a serious problem.
A very large value of <code>X</code> (hundreds or thousands of seconds) indicates a serious problem.


This can be confirmed by finding a long-running <code>smbd</code> process in <code>D</code> state and obtaining a kernel stack trace (on Linux, </code>/proc/&lt;pid&gt;/stack</code>). See the documentation for the <code>[database] lock debug script</code> [https://ctdb.samba.org/manpages/ctdb.conf.5.html ctdb.conf(5)] option for an automated way of debugging this (when robust mutexes are in use, which is the modern Samba default, this automated method only works on versions >= 4.15).
This can be confirmed by finding a long-running <code>smbd</code> process in <code>D</code> state and obtaining a kernel stack trace (on Linux, <code>/proc/&lt;pid&gt;/stack</code>). See the documentation for the [https://ctdb.samba.org/manpages/ctdb.conf.5.html ctdb.conf(5)] <code>[database] lock debug script</code> option for an automated way of debugging this (when robust mutexes are in use, which is the modern Samba default, this automated method only works on versions >= 4.15).


As hinted at above, the usual reason for this type of problem is a cluster filesystem issue.
As hinted at above, the usual reason for this type of problem is a cluster filesystem issue.

Revision as of 03:55, 8 October 2021

Record contention

CTDB's distributed volatile databases are subject to contention for database records. This can result in performance issues. Contention is most often seen in locking.tdb (and sometimes brlock.tdb). Records in these databases are directly associated with files, so when several nodes contend for access to metadata for a particular file or directory, the associated record(s) are contended and bounce between nodes.

In this situation it is important to understand that CTDB is only involved in creating records and moving them between nodes. smbd looks for a record in the desired TDB and if it determines that the latest version of that record is present on the current node then it uses that record. There are 2 other cases:

  • The record is present but the current node does not have the latest copy
  • The record is not present

In both cases smbd will ask ctdbd to fetch the record. So, when multiple nodes need to access the same record then that record will bounce. This can be expensive.

Log messages indicating poor performance

Log messages like the following are an indicator of performance problems:

 db_ctdb_fetch_locked for /var/cache/dbdir/volatile/locking.tdb.N key ABCDEFBC2A66F9AD1C55142C290000000000000000000000, chain 62588 needed 1 attempts, X milliseconds, chainlock: Y ms, CTDB Z ms

If Z is large (multiple seconds, particularly tens of seconds) then CTDB took a long time to fetch the record from another node.

Aside: stuck smbd processes

If Z is even larger (hundreds or thousands of seconds) then this can indicate that an smbd process on a node is stuck in D state, probably in a cluster filesystem system call, while holding a TDB lock. In this case the above db_ctdb_fetch_locked messages may not even be seen because a record is never successfully fetched. In this case, one or more repeated message like the following may be seen:

 Unable to get RECORD lock on database locking.tdb for X seconds

A very large value of X (hundreds or thousands of seconds) indicates a serious problem.

This can be confirmed by finding a long-running smbd process in D state and obtaining a kernel stack trace (on Linux, /proc/<pid>/stack). See the documentation for the ctdb.conf(5) [database] lock debug script option for an automated way of debugging this (when robust mutexes are in use, which is the modern Samba default, this automated method only works on versions >= 4.15).

As hinted at above, the usual reason for this type of problem is a cluster filesystem issue.

Hot keys

The hot keys section of ctdb dbstatistics locking.tdb statistics output lists the keys in locking.tdb that have been fetched to a node the most times. Substitute other database names as appropriate.

Workarounds

Deliberately breaking lock coherency

Discussion of deliberately but carefully breaking lock coherency using:

 fileid:algorithm = fsname_norootdir

or even:

 fileid:algorithm = fsname_nodirs

See vfs_fileid(8). This needs to be carefully considered and understood to avoid filesystem corruption.