Clustered Samba: Difference between revisions

From SambaWiki
m (new files link)
No edit summary
Line 1: Line 1:
Current [[Samba]] implementations can't be used directly on [[Cluster|clustered systems]] cause they are oriented to single-server systems. The main reason is that [[smbd]] processes use [[TDB]] (local-oriented database) for [[messaging]], storing shared data, etc - there is no way to coordinate [[smbd]] processes run on different cluster nodes.
Current [[Samba]] implementations can't be used directly on [[Cluster|clustered systems]] cause they are oriented to single-server systems. The main reason is that [[smbd]] processes use [[TDB]] (local-oriented database) for [[messaging]], storing shared data, etc - there is no way to coordinate [[smbd]] processes run on different cluster nodes.


Simple [[Clustered Samba]] system looks like this:
A simple [[Clustered Samba]] system looks like this:


[[Image:clustered_samba.png]]
[[Image:clustered_samba.png]]
Line 7: Line 7:
(you can get all images's SVG sources here - [http://samba.org/~ab/dralex/])
(you can get all images's SVG sources here - [http://samba.org/~ab/dralex/])


Each node has its own [[smbd]] daemon - they should communicate with each other to avoid shared data corruption, treat [[oplock|oplocks]] and so on.
Each node has its own [[smbd]] daemon - they should communicate with
each other to avoid shared data corruption, treat [[oplock|oplocks]] and
So the most problem is the extending of the [[locking]] subsystem ([[share mode locks|share mode locks]], [[oplock|opportunistic locks]], [[byte-range lock|byte-range locks]]) to multi-node system and cluster point of view.
so on. So the most problem is the extending of the [[locking]] subsystem
([[share mode locks|share mode locks]], [[oplock|opportunistic locks]],
[[byte-range lock|byte-range locks]]) to multi-node system and cluster
point of view.


== The Locking Problem ==
== Clustered Samba Prototypes ==


Two prototypes of a clustered Samba implementation have been developed in
Here we'll gather information on how locking mechanisms are done in [[Samba3]].
branches on svn.samba.org:
* http://viewcvs.samba.org/cgi-bin/viewcvs.cgi/branches/tmp/jpeach-cluster/
* http://viewcvs.samba.org/cgi-bin/viewcvs.cgi/branches/tmp/vl-cluster/


== The Cluster Synchronization Problem ==
Locking is used in different file operations (opening, closing, deleting), but the main place is huge [[open_file_ntcreate]] function (in fact, [[open_file_ntcreate]] gives high-level interface to share access testing).


The fundamental problem in implementing a clustered Samba is distributing
Share mode information consist of two different parts: share mode info and deferred opens info. The former is a collection of file sharing parameters (file name, list of processes opened the file, oplock info, etc). The latter is the part of the deferring open calls mechanism.
the Samba state across the cluster.


Only a portion of the state Samba associates with each file is pushed
All locking operations are presented on the scheme [[:Image:Samba_locking_calls.png]] (based on [[Analysing source code|code analysis]] of [[get_share_mode_lock]] function).
down into the filesystem. We can assume that the state that is pushed
into the filesystem is distributed by the cluster itself. This leaves us
with the problem of distributing information that is stored outside the
filesystem.

From the point of view of providing strong data integrity, the primary
information we need to be concerned about is the locking state. There is an
implicit requirement to ensure that any information we manually distribute
across the cluster is node-independent.

=== Distributing Locking State ===

Here we'll gather information on how locking mechanisms are done in
[[Samba3]].

Locking is used in different file operations (opening, closing, deleting),
but the main place is huge [[open_file_ntcreate]] function (in fact,
[[open_file_ntcreate]] gives high-level interface to share access
testing).

Share mode information consist of two different parts: share mode info
and deferred opens info. The former is a collection of file sharing
parameters (file name, list of processes opened the file, oplock info,
etc). The latter is the part of the deferring open calls mechanism.

All locking operations are presented on the scheme
[[:Image:Samba_locking_calls.png]] (based on [[Analysing source code|code
analysis]] of [[get_share_mode_lock]] function).


Original smbd's locking operations include:
Original smbd's locking operations include:
Line 24: Line 60:
* local file operations (file renaming and deleting, posix locking, etc);
* local file operations (file renaming and deleting, posix locking, etc);
* modifications of locking storage (''locking.tdb'');
* modifications of locking storage (''locking.tdb'');
* sending messages to other lockers
* sending messages to other lock holders


All this actions are made under a tdb chain lock to avoid file system or ''locking.tdb'' conflict.
All this actions are made under a tdb chain lock to avoid file system or
''locking.tdb'' conflict.


Present locking mechanism uses single ''locking.tdb'' (database with shared information - share mode entries, [[deferred open]] entries, flags, etc) for all [[smbd]] processes:
Present locking mechanism uses single ''locking.tdb'' (database with
shared information - share mode entries, [[deferred open]] entries,
flags, etc) for all [[smbd]] processes:


[[Image:samba locking 1.png]]
[[Image:samba locking 1.png]]


=== Course of development ===
=== Cluster-Independent State Information ===

== Proposed Cluster Architecture ==


On [[Cluster|clustered systems]] internal [[messaging]] can be used to interconnect [[smbd]] processes and locking databases:
On [[Cluster|clustered systems]] internal [[messaging]] can be used to
interconnect [[smbd]] processes and locking databases:


[[Image:samba locking 2.png]]
[[Image:samba locking 2.png]]


This approach implies too many messaging opearations between [[smbd]] processes while locking/unlocking a file. They can be reduced noticeably if all locking information is stored on the one node - with the [[locking daemon]]:
This approach implies too many messaging opearations between [[smbd]]
processes while locking/unlocking a file. They can be reduced noticeably
if all locking information is stored on the one node - with the [[locking
daemon]]:


[[Image:samba locking 3.png]]
[[Image:samba locking 3.png]]

Revision as of 03:05, 29 May 2006

Current Samba implementations can't be used directly on clustered systems cause they are oriented to single-server systems. The main reason is that smbd processes use TDB (local-oriented database) for messaging, storing shared data, etc - there is no way to coordinate smbd processes run on different cluster nodes.

A simple Clustered Samba system looks like this:

Clustered samba.png

(you can get all images's SVG sources here - [1])

Each node has its own smbd daemon - they should communicate with each other to avoid shared data corruption, treat oplocks and so on. So the most problem is the extending of the locking subsystem (share mode locks, opportunistic locks, byte-range locks) to multi-node system and cluster point of view.

Clustered Samba Prototypes

Two prototypes of a clustered Samba implementation have been developed in branches on svn.samba.org:

   * http://viewcvs.samba.org/cgi-bin/viewcvs.cgi/branches/tmp/jpeach-cluster/
   * http://viewcvs.samba.org/cgi-bin/viewcvs.cgi/branches/tmp/vl-cluster/ 

The Cluster Synchronization Problem

The fundamental problem in implementing a clustered Samba is distributing the Samba state across the cluster.

Only a portion of the state Samba associates with each file is pushed down into the filesystem. We can assume that the state that is pushed into the filesystem is distributed by the cluster itself. This leaves us with the problem of distributing information that is stored outside the filesystem.

From the point of view of providing strong data integrity, the primary information we need to be concerned about is the locking state. There is an implicit requirement to ensure that any information we manually distribute across the cluster is node-independent.

Distributing Locking State

Here we'll gather information on how locking mechanisms are done in Samba3.

Locking is used in different file operations (opening, closing, deleting), but the main place is huge open_file_ntcreate function (in fact, open_file_ntcreate gives high-level interface to share access testing).

Share mode information consist of two different parts: share mode info and deferred opens info. The former is a collection of file sharing parameters (file name, list of processes opened the file, oplock info, etc). The latter is the part of the deferring open calls mechanism.

All locking operations are presented on the scheme Image:Samba_locking_calls.png (based on code analysis of get_share_mode_lock function).

Original smbd's locking operations include:

  • local file operations (file renaming and deleting, posix locking, etc);
  • modifications of locking storage (locking.tdb);
  • sending messages to other lock holders

All this actions are made under a tdb chain lock to avoid file system or locking.tdb conflict.

Present locking mechanism uses single locking.tdb (database with shared information - share mode entries, deferred open entries, flags, etc) for all smbd processes:

Samba locking 1.png

Cluster-Independent State Information

Proposed Cluster Architecture

On clustered systems internal messaging can be used to interconnect smbd processes and locking databases:

Samba locking 2.png

This approach implies too many messaging opearations between smbd processes while locking/unlocking a file. They can be reduced noticeably if all locking information is stored on the one node - with the [[locking daemon]]:

Samba locking 3.png