GlusterFS: Difference between revisions

From SambaWiki
No edit summary
No edit summary
Line 180: Line 180:
The next step is creating the volume. But before we create the volume of two bricks let me explain some things. If you start creating the volume and give just two bricks as parameter you will see a warning, that it's not a good idea to create a replicated volume with only two bricks, because you will not be able to to set up a ''quorum''. In a productive environment you should always create a replicate volume of an odd number of nodes, because of the quorum.
The next step is creating the volume. But before we create the volume of two bricks let me explain some things. If you start creating the volume and give just two bricks as parameter you will see a warning, that it's not a good idea to create a replicated volume with only two bricks, because you will not be able to to set up a ''quorum''. In a productive environment you should always create a replicate volume of an odd number of nodes, because of the quorum.


===What is the ''quorum'' and why is it so important?===
===What is the quorum and why is it so important?===


If you set up three nodes and you are lose the connection between node-1 and the other nodes (node-2 and node-3) but still the clients from the production network can reach all three nodes and all three nodes still running the glusterd. So one client can connect to node-1 and do some changes on a file. Another client can connect to the rest of the Gluster-cluster (node-2 and node-3) and change the same file, because node-2 and node-3 can`t communicate to node-1 about open files anymore.
If you set up three nodes and you are lose the connection between node-1 and the other nodes (node-2 and node-3) but still the clients from the production network can reach all three nodes and all three nodes still running the glusterd. So one client can connect to node-1 and do some changes on a file. Another client can connect to the rest of the Gluster-cluster (node-2 and node-3) and change the same file, because node-2 and node-3 can't communicate to node-1 about open files anymore.


You will get a ''split brain'' of your cluster as soon as the connection is reestablished. If you configure a quorum of 51% the two nodes still communicate (node-2 and node-3) will meet the quorum, but the other node (node-1) not. So the Gluster-daemon will stop taking any changes from a client on node-1 the node will either stop the service or will go to a read-only status.
You will get a ''split brain'' of your cluster as soon as the connection is reestablished. If you configure a ''quorum'' of 51% the two nodes still communicate (node-2 and node-3) will meet the quorum, but the other node (node-1) not. So the Gluster-daemon will stop taking any changes from a client on node-1 the node will either stop the service or will go to a read-only status.


With two nodes you can't set up a quorum, because each node is 50% of the cluster. Only with an odd number of nodes you can configure a good working quorum. That's why you will get the warning when creating the volume with two nodes. But it's just a warning.
With two nodes you can't set up a quorum, because each node is 50% of the cluster. Only with an odd number of nodes you can configure a good working quorum. That's why you will get the warning when creating the volume with two nodes. But it's just a warning.


===CTDB and quorum==
===CTDB and quorum===
This problem will apply to CTDB too. In the future the developer plan to introduce an optional quorum where nodes will have to be connected to >50% of configured nodes before they can join the cluster.
This problem will apply to CTDB too. In the future the developer plan to introduce an optional quorum where nodes will have to be connected to >50% of configured nodes before they can join the cluster.



Revision as of 18:46, 21 February 2020

Fundamentals

GlusterFS is a free and open source scalable filesystem it can be used for cloud storage or to store data in a local network. It can be used to set up an active-active filesystem cluster with failover and loadbalancing via DNS-round robin. Together with CTDB it is possible to build a fileserver for a network with the following advantages:

  • Expandable without downtime
  • Mount Gluster volumes via the network
  • Posix ACL support
  • Different configurations possible (depending of your needs)
  • Self-healing
  • Support of snapshots if LVM2 thinly provisioned is used for the bricks

The different configurations available are:

  • Replicated Volume
  • Distributed Volume
  • Striped Volume
  • Replicated-Distributed Volume
  • Dispersed Volume

To read more about the different configurations see:



What you need

  • Two hosts with two network cards
  • An empty partition of 2GB to create the volume on each host
  • Two IP addresses from your production network
  • Two IP addresses for the heartbeat network
  • The GlusterFS packages version 7.x

Hostnames and IPs

Here you see two tables with the used IP-addresses on both hosts.

production network

If a client should connect to the Gluster-cluster an IP-address from the production network is used.

Hostname IP-address Network name
cluster-01 192.168.56.101 example.net
cluster-02 192.168.56.102 example.net

Heartbeat network

The heartbeat network is only for the communication between the Gluster-nodes

Hostname IP-address Network name
c-01 192.168.57.101 heartbeat.net
c-02 192.168.57.102 heartbeat.net

The mountpoints

you need two mountpoints, one for the physical brick and one for the volume.

Mountpoint What to mount
/gluster The brick on each node
/glusterfs For the volume on each node

Setting up the LVM-partition

The first step will be, setting up the replicated Gluster-Volume with two nodes. As an example a partition with 2GB is used.

root@cluster-01:~# fdisk /dev/sdc

root@cluster-01:~# apt install lvm2 thin-provisioning-tools

root@cluster-01:~# pvcreate /dev/sdc1
  Physical volume "/dev/sdc1" successfully created.

root@cluster-01:~# vgcreate glustergroup /dev/sdc1
  Volume group "glustergroup" successfully created

root@cluster-01:~# lvcreate -L 1950M -T glustergroup/glusterpool
  Using default stripesize 64,00 KiB
  Rounding up size to full physical extent 1,91 GiB
  Logical volume "glusterpool" created.

root@cluster-01:~# lvcreate -V 1900M -T glustergroup/glusterpool -n glusterv1
  Using default stripesize 64,00 KiB.
  Logical volume "glusterv1" created.
 
root@cluster-01:~# mkfs.xfs /dev/glustergroup/glusterv1

root@cluster-01:~# mkdir /gluster

root@cluster-01:~# mount /dev/glustergroup/glusterv1 /gluster

root@cluster-01:~# echo /dev/glustergroup/glusterv1 /gluster xfs defaults 0 0 >> /etc/fstab

root@cluster-01:~# mkdir /gluster/brick

Do all the steps on both nodes.

Creating the peer pool

Before you can create the volume you have to set up a peer pool, by adding the two hosts as peer to the pool In next listing you will see the commands to add the second gluster-node to the pool. You have to do this on the first of the first gluster-host:

root@cluster-01:~# gluster peer probe c-02
peer probe: success. 

If you try to add the peer and you get one of the following error messages:

root@cluster-01:~# gluster peer probe c-02
Connection failed. Please check if gluster daemon is operational.

root@cluster-01:~# gluster peer probe c-02
peer probe: failed: Probe returned with Transport endpoint is not connected

The first error message will point you to a not running glusterd on the host you are trying to add the peer. Restart the the daemon

systemctl restart glusterd

The second error message will point to a not running daemon on the peer you are trying to add to the pool. Restart the glusterd on the other node.

If you could add the node c-02 on the node c-01, add the host c-01 to the trusted pool on node c-02

root@cluster-02:~# gluster peer probe c-01
peer probe: success. 

Now you can check the status of each node and take a look at the list of all nodes with the gluster-command

root@cluster-01:~# gluster peer status
Number of Peers: 1

Hostname: c-02
Uuid: aca7d361-51df-4d1f-9b0f-4cf494029f21
State: Peer in Cluster (Connected) 
root@cluster-02:~# gluster peer status
Number of Peers: 1
Other names:
c-02

Hostname: c-01.heartbeat.net
Uuid: adafbf93-e716-4d99-bf89-e8044d57e3aa
State: Peer in Cluster (Connected)
Other names:
c-01
root@cluster-02:~# gluster pool list
UUID					Hostname          	State
adafbf93-e716-4d99-bf89-e8044d57e3aa	c-01.heartbeat.net Connected 
aca7d361-51df-4d1f-9b0f-4cf494029f21	localhost          Connected 

On each host you will find the information of the peer in /var/lib/glusterd/peers/<UUID>

Now you have all the peers added to the pool, you will need for the gluster-volume.

The Gluster volume

The next step is creating the volume. But before we create the volume of two bricks let me explain some things. If you start creating the volume and give just two bricks as parameter you will see a warning, that it's not a good idea to create a replicated volume with only two bricks, because you will not be able to to set up a quorum. In a productive environment you should always create a replicate volume of an odd number of nodes, because of the quorum.

What is the quorum and why is it so important?

If you set up three nodes and you are lose the connection between node-1 and the other nodes (node-2 and node-3) but still the clients from the production network can reach all three nodes and all three nodes still running the glusterd. So one client can connect to node-1 and do some changes on a file. Another client can connect to the rest of the Gluster-cluster (node-2 and node-3) and change the same file, because node-2 and node-3 can't communicate to node-1 about open files anymore.

You will get a split brain of your cluster as soon as the connection is reestablished. If you configure a quorum of 51% the two nodes still communicate (node-2 and node-3) will meet the quorum, but the other node (node-1) not. So the Gluster-daemon will stop taking any changes from a client on node-1 the node will either stop the service or will go to a read-only status.

With two nodes you can't set up a quorum, because each node is 50% of the cluster. Only with an odd number of nodes you can configure a good working quorum. That's why you will get the warning when creating the volume with two nodes. But it's just a warning.

CTDB and quorum

This problem will apply to CTDB too. In the future the developer plan to introduce an optional quorum where nodes will have to be connected to >50% of configured nodes before they can join the cluster.

With 2 nodes it is very easy to get a stupid form of split brain. Node A is shut down and node B is active, updating information in persistent databases (perhaps id-mapping info?). Node B is shut down and node A is restarted. Now node A's old database is in use for a while. When node B is restarted then some databases from A might be used and some from B - it depends on the sequence numbers.

Creating the volume

Now choose one of the nodes to creating the volume. It doesn't matter which node you chose:

root@cluster-01:~# gluster volume create gv0 replica 2 c-01:/gluster/brick \
    c-02:/gluster/brick
Replica 2 volumes are prone to split-brain. Use Arbiter or Replica 3 to avoid \
   this. See: ht tp :// docs.gluster.org/en/latest/Administrator%20Guide/Split%20brain%20and%20ways%20to%20deal%20with%20it/.
Do you still want to continue?
(y/n) y

volume create: gv0: success: please start the volume to access data

This is the warning I mentioned before. But by typing a "y" you can create the volume anyway.

Now let's take a look at the setup and the status of the volume. Here you can see the result:

root@cluster-01:~# gluster v info

Volume Name: gv0
Type: Replicate
Volume ID: 5d1e1031-5474-48e9-9451-1dbeb5ebb79e
Status: Created
Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: c-01:/gluster/brick
Brick2: c-02:/gluster/brick
Options Reconfigured:
transport.address-family: inet
storage.fips-mode-rchecksum: on
nfs.disable: on
performance.client-io-threads: off
root@cluster-01:~# gluster v status
Volume gv0 is not started

With gluster v info or gluster volume info you will see the setup of the volume and a list of set parameters at the end of the output. The command gluster v status is telling you, that the cluster is not running. You have to start the volume before you can use it. Now you see the command and the new status:

root@cluster-01:~# gluster v start gv0
volume start: gv0: success
root@cluster-01:~# gluster v status gv0
Status of volume: gv0
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick c-01:/gluster/brick                   49152     0          Y       1583 
Brick c-02:/gluster/brick                   49152     0          Y       9830 
Self-heal Daemon on localhost               N/A       N/A        Y       1604 
Self-heal Daemon on c-02                    N/A       N/A        Y       9851 
 
Task Status of Volume gv0
------------------------------------------------------------------------------
There are no active volume tasks

Now the volume is running an ready to use.

To bring up the volume every time you restart your system you must enable the \textsl{glusterd} in systemd\index{systemd}, as you can see in listing~\ref{activate-glusterd}: \begin{lstlisting}[captionpos=b,label=activate-glusterd,caption=Activate glusterd] root@cluster-02:~# systemctl enable glusterd.service Created symlink /etc/systemd/system/multi-user.target.wants/glusterd.service -> \

      /lib/systemd/system/glusterd.service.

\end{lstlisting} Do you still want to continue?

(y/n) y

volume create: gv0: success: please start the volume to access data \end{lstlisting} This is the warning I mentioned before. But by typing a \flqq{}y\frqq{} you can create the volume anyway.

Now let`s take a look at the setup and the status of the volume. You can see the result in listing~\ref{first-status-vol}: \begin{lstlisting}[captionpos=b,label=first-status-vol,caption=Status after creating the volume] root@cluster-01:~# gluster v info

Volume Name: gv0 Type: Replicate Volume ID: 5d1e1031-5474-48e9-9451-1dbeb5ebb79e Status: Created Snapshot Count: 0 Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: c-01:/gluster/brick Brick2: c-02:/gluster/brick Options Reconfigured: transport.address-family: inet storage.fips-mode-rchecksum: on nfs.disable: on performance.client-io-threads: off

root@cluster-01:~# gluster v status Volume gv0 is not started \end{lstlisting} With \texttt{gluster v info} or \texttt{gluster volume info} you will see the setup of the volume and a list of set parameters ad the end of the output. The command \texttt{gluster v status} is telling you, that the cluster is not running. You have to start the volume before you can use it. Listing~\ref{start-vol-first} you see the command and the new status: \begin{lstlisting}[captionpos=b,label=start-vol-first,caption=Starting the volume] root@cluster-01:~# gluster v start gv0 volume start: gv0: success

root@cluster-01:~# gluster v status gv0 Status of volume: gv0 Gluster process TCP Port RDMA Port Online Pid


Brick c-01:/gluster/brick 49152 0 Y 1583 Brick c-02:/gluster/brick 49152 0 Y 9830 Self-heal Daemon on localhost N/A N/A Y 1604 Self-heal Daemon on c-02 N/A N/A Y 9851

Task Status of Volume gv0


There are no active volume tasks

\end{lstlisting} Now the volume is running an ready to use.

To bring up the volume every time you restart your system you must enable the \textsl{glusterd} in systemd\index{systemd}, as you can see in listing~\ref{activate-glusterd}: \begin{lstlisting}[captionpos=b,label=activate-glusterd,caption=Activate glusterd] root@cluster-02:~# systemctl enable glusterd.service Created symlink /etc/systemd/system/multi-user.target.wants/glusterd.service -> \

      /lib/systemd/system/glusterd.service.

\end{lstlisting}