Clustered Samba

Current Samba implementations can't be used directly on clustered systems cause they are oriented to single-server systems. The main reason is that smbd processes use TDB (local-oriented database) for messaging, storing shared data, etc - there is no way to coordinate smbd processes run on different cluster nodes.

Simple Clustered Samba system looks like this:

Each node has its own smbd daemon - they should communicate with each other to avoid shared data corruption, treat oplocks and so on. So the most problem is the extending of the locking subsystem (share mode locks, opportunistic locks, byte-range locks) to multi-node system and cluster point of view.

The Locking Problem

Here we'll gather information on how locking mechanisms are done in Samba3.

Locking is used in different file operations (opening, closing, deleting), but the main place is huge open_file_ntcreate function (in fact, open_file_ntcreate gives high-level interface to share access testing).

Share mode information consist of two different parts: share mode info and deferred opens info. The former is a collection of file sharing parameters (file name, list of processes opened the file, oplock info, etc). The latter is the part of the deferring open calls mechanism.

Original smbd's locking operations include:

  • local file operations (file renaming and deleting, posix locking, etc);
  • modifications of locking storage (locking.tdb);
  • sending messages to other lockers

All this actions are made under a tdb chain lock to avoid file system or locking.tdb conflict.

Present locking mechanism uses single locking.tdb (database with shared information - share mode entries, deferred open entries, flags, etc) for all smbd processes:

Course of development

On clustered systems internal messaging can be used to interconnect smbd processes and locking databases:

This approach implies too many messaging opearations between smbd processes while locking/unlocking a file. They can be reduced noticeably if all locking information is stored on the one node - with the locking daemon:

