GlusterFS is a free and open source scalable filesystem it can be used for cloud storage or to store data in a local network. It can be used to set up an active-active filesystem cluster with failover and loadbalancing via DNS-round robin. Together with CTDB it is possible to build a fileserver for a network with the following advantages:
- Expandable without downtime
- Mount Gluster volumes via the network
- Posix ACL support
- Different configurations possible (depending of your needs)
- Support of snapshots if LVM2 thinly provisioned is used for the bricks
The different configurations available are:
- Replicated Volume
- Distributed Volume
- Striped Volume
- Replicated-Distributed Volume
- Dispersed Volume
To read more about the different configurations see:
|This article is part of CTDB-setup so it just shows how to setup a replicated volume to be used with CTDB. The setup will be a two node replicated volume with 2GB diskspace, so it will be easy to reproduce the setup.|
What you need
- Two hosts with two network cards
- An empty partition of 2GB to create the volume on each host
- Two IP addresses from your production network
- Two IP addresses for the heartbeat network
- The GlusterFS packages version 7.x
Hostnames and IPs
Here you see two tables with the used IP-addresses on both hosts.
If a client should connect to the Gluster-cluster an IP-address from the production network is used.
The heartbeat network is only for the communication between the Gluster-nodes
you need two mountpoints, one for the physical brick and one for the volume.
|Mountpoint||What to mount|
|/gluster||The brick on each node|
|/glusterfs||For the volume on each node|
Setting up the LVM-partition
|Be sure that you are working with the right partition, you will lose all data if you choose the wrong partition.|
The first step will be, setting up the replicated Gluster-Volume with two nodes. As an example a partition with 2GB is used.
root@cluster-01:~# fdisk /dev/sdc root@cluster-01:~# apt install lvm2 thin-provisioning-tools root@cluster-01:~# pvcreate /dev/sdc1 Physical volume "/dev/sdc1" successfully created. root@cluster-01:~# vgcreate glustergroup /dev/sdc1 Volume group "glustergroup" successfully created root@cluster-01:~# lvcreate -L 1950M -T glustergroup/glusterpool Using default stripesize 64,00 KiB Rounding up size to full physical extent 1,91 GiB Logical volume "glusterpool" created. root@cluster-01:~# lvcreate -V 1900M -T glustergroup/glusterpool -n glusterv1 Using default stripesize 64,00 KiB. Logical volume "glusterv1" created. root@cluster-01:~# mkfs.xfs /dev/glustergroup/glusterv1 root@cluster-01:~# mkdir /gluster root@cluster-01:~# mount /dev/glustergroup/glusterv1 /gluster root@cluster-01:~# echo /dev/glustergroup/glusterv1 /gluster xfs defaults 0 0 >> /etc/fstab root@cluster-01:~# mkdir /gluster/brick
Do all the steps on both nodes.
Creating the peer pool
|Make sure that you are using the the hostnames from the heartbeat network, to be sure that the communication between the nodes is using the heartbeat network.|
Before you can create the volume you have to set up a peer pool, by adding the two hosts as peer to the pool In next listing you will see the commands to add the second gluster-node to the pool. You have to do this on the first of the first gluster-host:
root@cluster-01:~# gluster peer probe c-02 peer probe: success.
If you try to add the peer and you get one of the following error messages:
root@cluster-01:~# gluster peer probe c-02 Connection failed. Please check if gluster daemon is operational. root@cluster-01:~# gluster peer probe c-02 peer probe: failed: Probe returned with Transport endpoint is not connected
The first error message will point you to a not running glusterd on the host you are trying to add the peer. Restart the the daemon
systemctl restart glusterd
The second error message will point to a not running daemon on the peer you are trying to add to the pool. Restart the glusterd on the other node.
If you could add the node c-02 on the node c-01, add the host c-01 to the trusted pool on node c-02
root@cluster-02:~# gluster peer probe c-01 peer probe: success.
Now you can check the status of each node and take a look at the list of all nodes with the gluster-command
root@cluster-01:~# gluster peer status Number of Peers: 1 Hostname: c-02 Uuid: aca7d361-51df-4d1f-9b0f-4cf494029f21 State: Peer in Cluster (Connected)
root@cluster-02:~# gluster peer status Number of Peers: 1 Other names: c-02 Hostname: c-01.heartbeat.net Uuid: adafbf93-e716-4d99-bf89-e8044d57e3aa State: Peer in Cluster (Connected) Other names: c-01
root@cluster-02:~# gluster pool list UUID Hostname State adafbf93-e716-4d99-bf89-e8044d57e3aa c-01.heartbeat.net Connected aca7d361-51df-4d1f-9b0f-4cf494029f21 localhost Connected
On each host you will find the information of the peer in /var/lib/glusterd/peers/<UUID>
Now you have all the peers added to the pool, you will need for the gluster-volume.
The Gluster volume
The next step is creating the volume. But before we create the volume of two bricks let me explain some things. If you start creating the volume and give just two bricks as parameter you will see a warning, that it's not a good idea to create a replicated volume with only two bricks, because you will not be able to to set up a quorum. In a productive environment you should always create a replicate volume of an odd number of nodes, because of the quorum.
What is the quorum and why is it so important?
If you set up three nodes and you are lose the connection between node-1 and the other nodes (node-2 and node-3) but still the clients from the production network can reach all three nodes and all three nodes still running the glusterd. So one client can connect to node-1 and do some changes on a file. Another client can connect to the rest of the Gluster-cluster (node-2 and node-3) and change the same file, because node-2 and node-3 can't communicate to node-1 about open files anymore.
You will get a split brain of your cluster as soon as the connection is reestablished. If you configure a quorum of 51% the two nodes still communicate (node-2 and node-3) will meet the quorum, but the other node (node-1) not. So the Gluster-daemon will stop taking any changes from a client on node-1 the node will either stop the service or will go to a read-only status.
With two nodes you can't set up a quorum, because each node is 50% of the cluster. Only with an odd number of nodes you can configure a good working quorum. That's why you will get the warning when creating the volume with two nodes. But it's just a warning.
CTDB and quorum
This problem will apply to CTDB too. In the future the developer plan to introduce an optional quorum where nodes will have to be connected to >50% of configured nodes before they can join the cluster.
With 2 nodes it is very easy to get a stupid form of split brain. Node A is shut down and node B is active, updating information in persistent databases (perhaps id-mapping info?). Node B is shut down and node A is restarted. Now node A's old database is in use for a while. When node B is restarted then some databases from A might be used and some from B - it depends on the sequence numbers.
Creating the volume
Now choose one of the nodes to creating the volume. It doesn't matter which node you chose:
root@cluster-01:~# gluster volume create gv0 replica 2 c-01:/gluster/brick \ c-02:/gluster/brick Replica 2 volumes are prone to split-brain. Use Arbiter or Replica 3 to avoid \ this. See: ht tp :// docs.gluster.org/en/latest/Administrator%20Guide/Split%20brain%20and%20ways%20to%20deal%20with%20it/. Do you still want to continue? (y/n) y volume create: gv0: success: please start the volume to access data
This is the warning I mentioned before. But by typing a "y" you can create the volume anyway.
Now let's take a look at the setup and the status of the volume. Here you can see the result:
root@cluster-01:~# gluster v info Volume Name: gv0 Type: Replicate Volume ID: 5d1e1031-5474-48e9-9451-1dbeb5ebb79e Status: Created Snapshot Count: 0 Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: c-01:/gluster/brick Brick2: c-02:/gluster/brick Options Reconfigured: transport.address-family: inet storage.fips-mode-rchecksum: on nfs.disable: on performance.client-io-threads: off
root@cluster-01:~# gluster v status Volume gv0 is not started
gluster v info or
gluster volume info you will see the setup of the volume and a list of set parameters at the end of the output. The command
gluster v status is telling you, that the cluster is not running. You have to start the volume before you can use it. Now you see the command and the new status:
root@cluster-01:~# gluster v start gv0 volume start: gv0: success
root@cluster-01:~# gluster v status gv0 Status of volume: gv0 Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick c-01:/gluster/brick 49152 0 Y 1583 Brick c-02:/gluster/brick 49152 0 Y 9830 Self-heal Daemon on localhost N/A N/A Y 1604 Self-heal Daemon on c-02 N/A N/A Y 9851 Task Status of Volume gv0 ------------------------------------------------------------------------------ There are no active volume tasks
Now the volume is running an ready to use.
Setting some Samba-Options
To get a better performance from Gluster when connecting via SMB it's possible to set some options to your Gluster-volume. Starting from Gluster version 6 most of the options are put together in a group of options.
Starting with Gluster7 the group-option for Samba is not a part of the debian-packages anymore. So I created a new group-options file my-samba:
cluster.self-heal-daemon=enable performance.cache-invalidation=on server.event-threads=4 client.event-threads=4 performance.parallel-readdir=on performance.readdir-ahead=on performance.nl-cache-timeout=600 performance.nl-cache=on network.inode-lru-limit=200000 performance.md-cache-timeout=600 performance.stat-prefetch=on performance.cache-samba-metadata=on features.cache-invalidation-timeout=600 features.cache-invalidation=on nfs.disable=on cluster.data-self-heal=on cluster.metadata-self-heal=on cluster.entry-self-heal=on cluster.force-migration=disable
You have to put the file in
/var/lib/glusterd/groups/ the name of the file is
my-samba. If you find a file named
samba in this directory then you have the original file from the
You only have to set this options on one of your nodes. Here you see the command to set and list the new options:
root@cluster-02:~# gluster v set gv0 group my-samba volume set: success
root@cluster-02:~# gluster v info Volume Name: gv0 Type: Replicate Volume ID: 5d1e1031-5474-48e9-9451-1dbeb5ebb79e Status: Started Snapshot Count: 0 Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: c-01:/gluster/brick Brick2: c-02:/gluster/brick Options Reconfigured: cluster.force-migration: disable cluster.entry-self-heal: on cluster.metadata-self-heal: on cluster.data-self-heal: on features.cache-invalidation: on features.cache-invalidation-timeout: 600 performance.cache-samba-metadata: on performance.stat-prefetch: on performance.md-cache-timeout: 600 network.inode-lru-limit: 200000 performance.nl-cache: on performance.nl-cache-timeout: 600 performance.readdir-ahead: on performance.parallel-readdir: on client.event-threads: 4 server.event-threads: 4 performance.cache-invalidation: on cluster.self-heal-daemon: enable transport.address-family: inet storage.fips-mode-rchecksum: on nfs.disable: on performance.client-io-threads: off
As you can see, all the options are set. If you would like to set a single option you can do it with
gluster v set <volume-name> <option>=<value>. To reset an option to it's original value use
gluster v reset <volume-name> <option>. To see all options use
gluster v get <volume-name> all.