Difference between revisions of "Setting up a cluster filesystem"

m (Inconsistent device numbers)
(Add section about device and inode number consistency)
 
(2 intermediate revisions by the same user not shown)
Line 12: Line 12:
 
= Cluster file systems =
 
= Cluster file systems =
  
=== Components ===
+
== Components ==
  
 
Any cluster file system will have some or all of following components:
 
Any cluster file system will have some or all of following components:
Line 22: Line 22:
 
* User space tools for management
 
* User space tools for management
  
=== Limitations ===
+
== Limitations ==
  
 
Every clustered file system has its quirks and limitations.  Some of the file system limitations will affect the configuration of file services (Samba or NFS).
 
Every clustered file system has its quirks and limitations.  Some of the file system limitations will affect the configuration of file services (Samba or NFS).
Line 30: Line 30:
 
* Does file system have specific quorum requirements?
 
* Does file system have specific quorum requirements?
  
=== Implementation ===
+
=== Checking uniformity of device and inode numbering ===
 +
 
 +
File services (e.g. Samba or NFS) often generate file identifiers or handles from device and inode numbers.  These services may not work correctly if these numbers are not uniform across nodes.
 +
 
 +
This can be tested using the stat(1) command as follows:
 +
 
 +
# onnode all stat -c '%d:%i' /clusterfs/testfile
 +
 +
>> NODE: 10.1.1.1 <<
 +
41:35820037
 +
 +
>> NODE: 10.1.1.2 <<
 +
41:35820037
 +
 +
>> NODE: 10.1.1.3 <<
 +
38:35820037
 +
 
 +
Note that the device numbers are not consistent across nodes.  File services sometimes provide a way of working around this (e.g. [[Configuring_clustered_Samba#Filesystem_specific_configuration|Samba]]).
 +
 
 +
Some cluster filesystems (especially some [https://en.wikipedia.org/wiki/Filesystem_in_Userspace FUSE]-based ones) do not provide consistent inode numbers across nodes.  There is often no workaround for this.
 +
 
 +
== Implementation ==
  
 
Each clustered file system example will describe how to set up a clustered file system for 3 node cluster.  The implementation can be scaled down to 2 nodes or scaled up to more nodes.
 
Each clustered file system example will describe how to set up a clustered file system for 3 node cluster.  The implementation can be scaled down to 2 nodes or scaled up to more nodes.
Line 61: Line 82:
  
 
[https://oss.oracle.com/projects/ocfs2/ OCFS2] is a general-purpose shared-disk cluster file system for Linux capable of providing both high performance and high availability.
 
[https://oss.oracle.com/projects/ocfs2/ OCFS2] is a general-purpose shared-disk cluster file system for Linux capable of providing both high performance and high availability.
 
= Other cluster filesystems =
 
 
If you can't find documentation about your choice of cluster filesystem and clustered Samba then you might need to work around some limitations.
 
 
== Inconsistent device numbers ==
 
 
''Note: This section probably wants to be in a future page about cluster filesystems and Samba configuration.  It can be moved later...''
 
 
Locking will not work if a cluster filesystem does not provide unique device numbers across nodes.
 
 
Consider the following example:
 
 
# onnode all stat /clusterfs/testfile
 
 
>> NODE: 10.1.1.1 <<
 
  File: `/clusterfs/testfile'
 
  Size: 1286700      Blocks: 2514      IO Block: 65536  regular file
 
Device: '''29h/41d'''    Inode: 35820037    Links: 1
 
Access: (0774/-rwxrwxr--)  Uid: ( 3535/    foo)  Gid: (  513/Domain Users)
 
Access: 2016-11-03 19:51:46.000000000 +0000
 
Modify: 2016-11-01 13:06:04.000000000 +0000
 
Change: 2016-11-01 13:06:04.000000000 +0000
 
 
>> NODE: 10.1.1.2 <<
 
  File: `/clusterfs/testfile'
 
  Size: 1286700      Blocks: 2514      IO Block: 65536  regular file
 
Device: '''29h/41d'''    Inode: 35820037    Links: 1
 
Access: (0774/-rwxrwxr--)  Uid: ( 3535/    foo)  Gid: (  513/Domain Users)
 
Access: 2016-11-03 19:51:46.000000000 +0000
 
Modify: 2016-11-01 13:06:04.000000000 +0000
 
Change: 2016-11-01 13:06:04.000000000 +0000
 
 
>> NODE: 10.1.1.3 <<
 
  File: `/clusterfs/testfile'
 
  Size: 1286700      Blocks: 2514      IO Block: 65536  regular file
 
Device: '''26h/38d'''    Inode: 35820037    Links: 1
 
Access: (0774/-rwxrwxr--)  Uid: ( 3535/    foo)  Gid: (  513/Domain Users)
 
Access: 2016-11-03 19:51:46.000000000 +0000
 
Modify: 2016-11-01 13:06:04.000000000 +0000
 
Change: 2016-11-01 13:06:04.000000000 +0000
 
 
Note that the device numbers are not consistent across nodes.  Locks set for the file on the first 2 nodes will not affect the 3rd node.
 
 
To work around this, the following settings should be in the global section of the Samba configuration:
 
 
vfs objects = fileid
 
fileid:mapping = fsname
 
 
See [https://www.samba.org/samba/docs/man/manpages/vfs_fileid.8.html vfs_fileid(8)] for more information.
 

Latest revision as of 04:47, 8 January 2018

Goal

Set up a clustered file system to be used with CTDB for providing clustered file services.

In addition,

  • How to test if posix locking is supported on the file system?
  • Limitations when using clustered file system

Setting up clustered file system has nothing to do with CTDB. This information is provided for completeness. Users should be aware of any limitations of particular clustered file system.

Cluster file systems

Components

Any cluster file system will have some or all of following components:

  • Shared or distributed storage
  • Kernel or user space file system driver
  • User space file system daemon(s)
  • User space distributed lock manager
  • User space tools for management

Limitations

Every clustered file system has its quirks and limitations. Some of the file system limitations will affect the configuration of file services (Samba or NFS).

  • Does file system provide a consistent view across all the nodes (for example - uniform device and inode numbering) ?
  • Does file system provide posix locking semantics (cluster-aware locking)?
  • Does file system have specific quorum requirements?

Checking uniformity of device and inode numbering

File services (e.g. Samba or NFS) often generate file identifiers or handles from device and inode numbers. These services may not work correctly if these numbers are not uniform across nodes.

This can be tested using the stat(1) command as follows:

# onnode all stat -c '%d:%i' /clusterfs/testfile

>> NODE: 10.1.1.1 <<
41:35820037

>> NODE: 10.1.1.2 <<
41:35820037

>> NODE: 10.1.1.3 <<
38:35820037

Note that the device numbers are not consistent across nodes. File services sometimes provide a way of working around this (e.g. Samba).

Some cluster filesystems (especially some FUSE-based ones) do not provide consistent inode numbers across nodes. There is often no workaround for this.

Implementation

Each clustered file system example will describe how to set up a clustered file system for 3 node cluster. The implementation can be scaled down to 2 nodes or scaled up to more nodes.

GPFS

GPFS is a proprietary cluster file system from IBM.


GFS2

GFS2 is a clustered file system supported by Red Hat.


Lustre

Lustre file system is an open-source, parallel file system that supports many requirements of leadership class HPC simulation environments.


GlusterFS

GlusterFS is a scalable network file system.


OCFS2

OCFS2 is a general-purpose shared-disk cluster file system for Linux capable of providing both high performance and high availability.