Upgrading a CTDB cluster

From SambaWiki
Revision as of 05:08, 5 July 2017 by MartinSchwenke (talk | contribs) (→‎Policy: Add note about enforcement and AllowMixedVersions tunable)

Rolling Upgrades

Some people are fond of rolling upgrades, which means that services are continually available during upgrade. This makes a certain amount of sense, given that a CTDB cluster provides high availability.

One way of accomplishing a rolling upgrade is:

  1. Shut down CTDB on 1/2 the nodes in the cluster. Doing this will shut down services managed by CTDB (e.g. Samba)
  2. Upgrade CTDB (and, if applicable, Samba and other software)
  3. Restart CTDB
  4. Repeat for the remaining nodes in the cluster

Problems with Rolling Upgrades

CTDB and Samba are under constant development. There can be large and incompatible changes between versions. This means that CTDB can be incompatible between nodes. Data structures that Samba stores in TDBs managed by CTDB can also change.

This can mean that the upgraded 1/2 of a cluster may not be able to interoperate with the other 1/2 of the cluster.


We attempt to implement the following policy in CTDB:

  • For major and minor version updates (i.e. X.Y.Z to X'.Y.Z or X.Y'.Z), rolling upgrades are not supported. Please shutdown CTDB on all nodes, install upgrades and then restart CTDB on all nodes.
  • For releases within a minor version (i.e. X.Y.Z to X.Y.Z'), rolling upgrades will work unless otherwise stated.

This policy is now implemented by CTDB. Incompatible CTDB versions will automatically shut down. This behaviour can be disabled via the AllowMixedVersions tunable option - see ctdb-tunables(7) for details.


  • Always perform a test upgrade on a test cluster to see if the upgrade works as expected
  • Remember that problems with rolling upgrade are most likely to occur if there is an active (SMB) workload... and if there is no workload then you don't need rolling upgrade
  • You should probably schedule a maintenance window for an upgrade just in case things go wrong - if you have to do that then you might as well briefly take down the whole cluster