Upgrading a CTDB cluster

From SambaWiki

Rolling Upgrades

Some people are fond of rolling upgrades, which means that services are continually available during upgrade. This makes a certain amount of sense, given that a CTDB cluster provides high availability.

One way of accomplishing a rolling upgrade is:

  1. Shut down CTDB on 1/2 the nodes in the cluster. Doing this will shut down services managed by CTDB (e.g. Samba)
  2. Upgrade CTDB (and, if applicable, Samba and other software)
  3. Restart CTDB
  4. Repeat for the remaining nodes in the cluster

Problems with Rolling Upgrades

CTDB and Samba are under constant development. There can be large and incompatible changes between versions. This means that CTDB can be incompatible between nodes. Data structures that Samba stores in TDBs managed by CTDB can also change.

This can mean that the upgraded 1/2 of a cluster may not be able to interoperate with the other 1/2 of the cluster.

Policy

We attempt to implement the following policy in CTDB:

  • For major and minor version updates (i.e. X.Y.Z to X'.Y.Z or X.Y'.Z), rolling upgrades are not supported. Please shutdown CTDB on all nodes, install upgrades and then restart CTDB on all nodes.
  • For releases within a minor version (i.e. X.Y.Z to X.Y.Z'), rolling upgrades will work unless otherwise stated.

This policy is implemented by CTDB. Incompatible CTDB versions will automatically shut down. This behaviour can be disabled via the AllowMixedVersions tunable option - see ctdb-tunables(7) for details.

However, in the likely event that you are using CTDB to manage Clustered Samba then the situation is more complex.

Samba policy

Different Samba releases (even within a minor version) can contain different definitions of binary structures stored in TDBs and can use different semantics for cluster communication elements. This means that it can be unsafe to do a rolling upgrade with Samba running.

Therefore, a policy has been implemented in smbd, which disallows a new version from starting on one node if other versions are running on other nodes. This behaviour can be disabled via the allow unsafe cluster upgrade configuration option - see smb.conf(5) for details. Do not use this option carelessly as it can lead to crashing smbd processes or even data corruption.

Suggestions

  • Always perform a test upgrade on a test cluster to see if the upgrade works as expected
  • Remember that problems with rolling upgrade are most likely to occur if there is an active (SMB) workload... and if there is no workload then you don't need rolling upgrade
  • You should probably schedule a maintenance window for an upgrade just in case things go wrong - if you have to do that then you might as well briefly take down the whole cluster