Upgrading a CTDB cluster

From SambaWiki
Revision as of 10:23, 18 March 2017 by MartinSchwenke (talk | contribs) (Created page with "= Rolling Upgrades = Some people are fond of rolling upgrades, which means that services are continually available during upgrade. This makes a certain amount of sense, given...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Rolling Upgrades

Some people are fond of rolling upgrades, which means that services are continually available during upgrade. This makes a certain amount of sense, given that a CTDB cluster provides high availability.

One way of accomplishing a rolling upgrade is:

  1. Shut down CTDB on 1/2 the nodes in the cluster. Doing this will shut down services managed by CTDB (e.g. Samba)
  2. Upgrade CTDB (and, if applicable, Samba and other software)
  3. Restart CTDB
  4. Repeat for the remaining nodes in the cluster

Problems with Rolling Upgrades

CTDB and Samba are under constant development. There can be large and incompatible changes between versions. This means that CTDB can be incompatible between nodes. Data structures that Samba stores in TDBs managed by CTDB can also change.

This can mean that the upgraded 1/2 of a cluster may not be able to interoperate with the other 1/2 of the cluster.

Policy

We attempt to implement the following policy in CTDB:

  • For major and minor version updates (i.e. X.Y.Z to X'.Y.Z or X.Y'.Z), rolling upgrades are not supported. Please shutdown CTDB on all nodes, install upgrades and then restart CTDB on all nodes.
  • For releases within a minor version (i.e. X.Y.Z to X.Y.Z'), rolling upgrades will work unless otherwise stated.

Suggestions

  • Always perform a test upgrade on a test cluster to see if the upgrade works as expected
  • Remember that problems with rolling upgrade are most likely to occur if there is an active (SMB) workload... and if there is no workload then you don't need rolling upgrade
  • You should probably schedule a maintenance window for an upgrade just in case things go wrong - if you have to do that then you might as well briefly take down the whole cluster