Samba CTDB GlusterFS Cluster HowTo: Difference between revisions
StefanKania (talk | contribs) No edit summary |
StefanKania (talk | contribs) No edit summary |
||
Line 274: | Line 274: | ||
* The ''Recovery master'' is the node which is responsible for a recovery. |
* The ''Recovery master'' is the node which is responsible for a recovery. |
||
If you just want to see the status of one, or some hosts you can use |
If you just want to see the status of all nodes or just one node, or some hosts, you can use <code>ctdb nodestatus</code>. With ''nodestatus'' you can see the status of the actual node or a list of hosts: |
||
root@cluster-01:~# ctdb nodestatus |
|||
⚫ | |||
⚫ | |||
⚫ | |||
root@cluster-01:~# ctdb nodestatus 1 |
root@cluster-01:~# ctdb nodestatus 1 |
||
pnn:1 192.168.57.43 OK |
pnn:1 192.168.57.43 OK |
||
root@cluster-01:~# ctdb nodestatus 0,1 |
root@cluster-01:~# ctdb nodestatus 0,1 |
||
pnn:0 192.168.57.42 OK (THIS NODE) |
pnn:0 192.168.57.42 OK (THIS NODE) |
||
pnn:1 192.168.57.43 OK |
pnn:1 192.168.57.43 OK |
||
⚫ | |||
⚫ | |||
⚫ | |||
⚫ | |||
⚫ | |||
⚫ | |||
⚫ | |||
⚫ | |||
\end{lstlisting} |
|||
There are some more possibilities to check the cluster |
There are some more possibilities to check the cluster: |
||
⚫ | |||
\begin{lstlisting}[captionpos=b,label=more-ctdb-tests,caption=Some more tests] |
|||
Current time of node 0 : Wed Feb 12 18:56:14 2020 |
|||
⚫ | |||
Ctdbd start time : (000 04:53:49) Wed Feb 12 14:02:25 2020 |
|||
Time of last recovery/failover: (000 04:53:43) Wed Feb 12 14:02:31 2020 |
|||
Duration of last recovery/failover: 0.573115 seconds |
|||
Duration of last recovery/failover: 0.573115 seconds |
|||
⚫ | |||
⚫ | |||
⚫ | |||
⚫ | |||
⚫ | |||
⚫ | |||
⚫ | |||
⚫ | |||
⚫ | |||
⚫ | |||
====Check all services==== |
|||
⚫ | |||
The next important check is to see which services can CTDB provide and which services are actually configured. You could use the command <code>ctdb scriptstatus</code> to list all the running services, but this command is deprecated. Since Samba 4.9 you should check the event scripts with <code> event status legacy monitor</code>. Then you will see all the running (monitored) services. To see the list of all services do a <code>ctdb event script list legacy</code>. |
|||
⚫ | |||
root@cluster-01:~# ctdb event status legacy monitor |
|||
⚫ | |||
00.ctdb OK 0.005 Thu Feb 13 18:28:35 2020 |
|||
01.reclock OK 0.023 Thu Feb 13 18:28:35 2020 |
|||
05.system OK 0.017 Thu Feb 13 18:28:35 2020 |
|||
10.interface OK 0.019 Thu Feb 13 18:28:35 2020 |
|||
49.winbind OK 0.011 Thu Feb 13 18:28:35 2020 |
|||
50.samba OK 0.101 Thu Feb 13 18:28:35 2020 |
|||
root@cluster-01:~# ctdb |
root@cluster-01:~# ctdb event script list legacy |
||
* 00.ctdb |
|||
⚫ | |||
* 01.reclock |
|||
* 05.system |
|||
06.nfs |
|||
* 10.interface |
|||
11.natgw |
|||
11.routing |
|||
13.per_ip_routing |
|||
20.multipathd |
|||
31.clamd |
|||
40.vsftpd |
|||
41.httpd |
|||
* 49.winbind |
|||
* 50.samba |
|||
60.nfs |
|||
70.iscsi |
|||
91.lvs |
|||
The first command is showing all running services. The second command is showing all services CTDB can provide, all running services are marked with a *. |
|||
⚫ | |||
⚫ | |||
⚫ | |||
⚫ | |||
\end{lstlisting} |
|||
⚫ |
Revision as of 15:25, 23 February 2020
Introduction
CTDB is a clustered database component in clustered Samba that provides a high-availability load-sharing CIFS server cluster.
The main functions of CTDB are:
- Provide a clustered version of the TDB database with automatic rebuild/recovery of the databases upon node failures.
- Monitor nodes in the cluster and services running on each node.
- Manage a pool of public IP addresses that are used to provide services to clients. Alternatively, CTDB can be used with LVS.
Combined with a cluster filesystem CTDB provides a full high-availability (HA) environment for services such as clustered Samba, NFS and other services.
Setting up CTDB
After setting up the cluster filesystem you can set up a CTDB-cluster.
To user CTDB you have to install the ctdb-package for your distribution. After installing the package with all it's dependencies you will find a directory /etc/ctdb
. Inside tis directory you need some configuration files for CTDB.
Let's take a look at the files needed for configuring CTDB.
File | Content |
---|---|
/etc/ctdb/ctdb.conf | Basic configuration |
/etc/ctdb/script.options | Setting options for event-scripts |
/etc/ctdb/nodes | All IP-addresses of all nodes |
/etc/ctdb/public_addresses | Dynamic IP-addresses for all nodes |
The ctdb.conf file
The ctdb.conf
file has changed a lot from the old configuration style (< Samba 4.9). This file will no longer be used to configure the different services managed by CTDB. At the moment the only setting you have to do inside the file is setting up the recovery lock file. This file is used by all nodes to check if it's possible to lock files inside the cluster for exclusive use. If you don't use a recovery lock file your cluster can run into a split brain situation. By default the 'recovery lock' is NOT set. You should not use CTDB without a recovery lock unless you know what you are doing. The variable must point to a file inside your mounted gluster-volume. To use the recovery lock enter the following line into /etc/ctdb/ctdb.conf
on both nodes:
recovery lock = /glusterfs/ctdb.lock
![]() | The recovery lock setting needs to be in the [cluster] section. |
![]() | You don't have to create the recovery lock file, it will be created by CTDB on the first start of the CTDB-daemon. |
The file script.options
All the service CTDB will provide will be started via special scripts. In this file you can set options to the script. An example is shown in the script. There is as section for the service-script 50.samba.options
named CTDB_SAMBA_SKIP_SHARE_CHECK this option by default is set to yes. This means, every time you create a new share CTDB will check if the path exists, if not CTDB will stop. But if you use the vfs-module glusterfs you will have no local path in the share-configuration. The share points to a directory on your gluster-volume, so CTDB can`t check the path. So if you going to use glusterfs you must set this option for Samba to no.
Because you can set all options to all service-scripts in this file, you don't have to change any of the service-scripts. You will find more information on all options in the manpage man ctdb-script.options
.
The file nodes
CTDB must know all hosts belonging to it's cluster, in this file you have to put all IPs from the heartbeat network of all nodes. This file must have the same content on all nodes. Just put the two IPs from the two nodes into the file. Here you see the content of the file.
192.168.57.42 192.168.57.43
In most distributions the file does'n exists, you have to create it.
The file public_addresses
Every time CTDB starts it will provide an IP-address to all nodes in the CTDB-Cluster, this must be an IP-address from the production network.
After starting the cluster, CTDB will take care of those IP-addresses and will give an IP-address of this list to every CTDB-node. If a CTDB-node crashes CTDB will assign the IP-address, from the crashed node, to another CTDB-node. So every IP-address from this file is always assigned to on of the nodes.
CTDB is doing the failover for the services. If one node fails the IP-address will switch to one of the remaining nodes. All clients will then reconnect to this node. That`s possible because all nodes have all session-information of all clients.
For each node you need a public_addresses
-file. The files can be different on the nodes, depending to which subnet you would like to assign the node. The example uses just one subnet, so both nodes have identical public_addresses
-files. Here you see the content of the file:
192.168.56.101/24 enp0s8 192.168.56.102/24 enp0s8
Starting CTDB the first time
Now you have configured the CTDB-service on both nodes, then you will be ready for the first start. To see what will happened during the start you can open another terminal and start tail -f /var/log/ctdb/ctdb.log
to see the messages. First start one node with systemctl restart ctdb
, look at the log-messages and then start the second node and still keep an eye on the log.
2020/02/11 17:32:53.778637 ctdbd[1926]: monitor event OK - node re-enabled 2020/02/11 17:32:53.778831 ctdbd[1926]: Node became HEALTHY. Ask recovery master to reallocate IPs 2020/02/11 17:32:53.779152 ctdb-recoverd[1966]: Node 0 has changed flags - now 0x0 was 0x2 2020/02/11 17:32:54.575970 ctdb-recoverd[1966]: Unassigned IP 192.168.56.102 can be served by this node 2020/02/11 17:32:54.576047 ctdb-recoverd[1966]: Unassigned IP 192.168.56.101 can be served by this node 2020/02/11 17:32:54.576254 ctdb-recoverd[1966]: Trigger takeoverrun 2020/02/11 17:32:54.576780 ctdb-recoverd[1966]: Takeover run starting 2020/02/11 17:32:54.594527 ctdbd[1926]: Takeover of IP 192.168.56.102/24 on interface enp0s8 2020/02/11 17:32:54.595551 ctdbd[1926]: Takeover of IP 192.168.56.101/24 on interface enp0s8 2020/02/11 17:32:54.843175 ctdb-recoverd[1966]: Takeover run completed successfully
Here you can see, that the node has taken both dynamic IP-addresses, you can check this with ip a l enp0s8
.
![]() | Don't use ifconfig to list the IP-addresses, ifconfig will not list the dynamically assigned IP-addresses.. |
Before you start the second node, take a look at the CTDB-status with ctdb status
. You will see that the first node you have just started has the status OK, the other node has the status DISCONNECTED|UNHEALTHY|INACTIVE.
root@cluster-01:~# ctdb status Number of nodes:2 pnn:0 192.168.57.42 OK (THIS NODE) pnn:1 192.168.57.43 DISCONNECTED|UNHEALTHY|INACTIVE Generation:1636031659 Size:1 hash:0 lmaster:0 Recovery mode:NORMAL (0) Recovery master:0
Now you can start CTDB on the second node with systemctl restart ctdb
. Inside the log on the first node you will see the message that the takeover was successfully. Next you will see the last lines from the log:
2020/02/11 17:51:49.964668 ctdb-recoverd[6598]: Takeover run starting 2020/02/11 17:51:50.004374 ctdb-recoverd[6598]: Takeover run completed successfully 2020/02/11 17:51:59.061780 ctdb-recoverd[6598]: Reenabling recoveries after timeout 2020/02/11 17:52:04.632267 ctdb-recoverd[6598]: Node 1 has changed flags - now 0x0 was 0x2 2020/02/11 17:52:04.989395 ctdb-recoverd[6598]: Takeover run starting 2020/02/11 17:52:05.008763 ctdbd[6554]: Release of IP 192.168.56.102/24 on interface enp0s8 node:1 2020/02/11 17:52:05.154588 ctdb-recoverd[6598]: Takeover run completed successfully
If you see a lot of messages as in the next listing, check if the gluster-volume is mounted correctly and if the recovery lock-option in /etc/ctdb/ctdb.conf
is set correctly:
2020/02/11 17:51:00.883523 ctdbd[6554]: CTDB_WAIT_UNTIL_RECOVERED 2020/02/11 17:51:00.883630 ctdbd[6554]: ../../ctdb/server/ctdb_monitor.c:324 wait for pending recoveries to end. Wait one more second.
A look at the status will show both nodes OK, as you can see in ctsdb status
root@cluster-01:~# ctdb status Number of nodes:2 pnn:0 192.168.57.42 OK (THIS NODE) pnn:1 192.168.57.43 OK Generation:101877096 Size:2 hash:0 lmaster:0 hash:1 lmaster:1 Recovery mode:NORMAL (0) Recovery master:0
Whenever you stop a node, the other node will takeover the IP-address assigned by CTDB. Now CTDB is running, but you still have no service configured. You should only continue configuring the services if both nodes are healthy.
As a test you can stop CTDB on one node and you will see, that the other node will get the IP from the stopped node. As soon as you restart the node, one of the IP-addresses will be assigned to the node and both nodes will have the status OK again.
The next step will be setting up samba.
Configuring Samba
In this part you see how to set up a Samba-cluster to provide fileservices to Windows- and Linux-clients. The cluster will br joined into a Samba Active Directory-domain and create shares. There will be three different techniques described to serve the shares to the clients:
- Using the local mounted gluster-volume to create the share.
- Using the vfs-module glusterfs to directly point to the volume on the gluster-cluster without mounting the volume on the local system.
- Using the new vfs-module glusterfs_fuse.
Joining the domain
To join the cluster to the domain you have to do the following steps:
- Creating the DNS-records
- Configure Samba via registry
- Join the cluster
Creating the DNS-records
To join the domain, the first step should be creating the DNS-entries for the cluster. If you already have a reverse zone configured you can skip the command to create the reverse zone.
root@addc-01:~# kinit administrator administrator@EXAMPLE.NET's Password:******** root@addc-01:~# samba-tool dns zonecreate addc-01 56.168.192.in-addr.arpa -k yes Zone 56.168.192.in-addr.arpa created successfully root@addc-01:~# systemctl restart samba-ad-dc
You only have to restart the DC if you are using the internal DNS-server from Samba
root@addc-01:~# samba-tool dns add addc-01 example.net cluster A 192.168.56.101 Record added successfully root@addc-01:~# samba-tool dns add addc-01 example.net cluster A 192.168.56.102 Record added successfully root@addc-01:~# samba-tool dns add addc-01 56.168.192.in-addr.arpa 101 PTR cluster.example.net Record added successfully root@addc-01:~# samba-tool dns add addc-01 56.168.192.in-addr.arpa 102 PTR cluster.example.net Record added successfully
Test the namesesulution.
root@addc-01:~# host cluster cluster.example.net has address 192.168.56.101 cluster.example.net has address 192.168.56.102 root@addc-01:~# host 192.168.56.101 101.56.168.192.in-addr.arpa domain name pointer cluster.example.net.
root@addc-01:~# host 192.168.56.102 102.56.168.192.in-addr.arpa domain name pointer cluster.example.net.
In the listing you see that both dynamic IP-addresses getting the same DNS-name. Resolving the hostname will give you both IP-addresses. So a client can connect to either of the addresses and will always reach one of the cluster nodes.
![]() | A client always connect to the cluster and not to a special host. |
Configuring Samba
If you use Samba together with CTDB you have to use the registry to configure Samba. The reason is, that you configure the cluster and not every single host. All configurations of the cluster can be done on either one of the Samba-hosts. The CTDB-Samba-hosts will share all TDB-files. The files will be stored locally but will be committed between all nodes.
The easiest way to write the settings into the registry is to write a file in smb.conf-style and the import the file into the registry.
If you start Samba, the first place where Samba looks for any configuration is the smb.conf
file. This is the reason why you always have a minimum smb.conf
. In the following listing you will see the settings for the configuration.
[global] workgroup = EXAMPLE realm = EXAMPLE.NET netbios name = CLUSTER security = ADS template shell = /bin/bash winbind use default domain = Yes winbind refresh tickets = Yes idmap config *:range = 10000-19999 idmap config samba-ad:range = 1000000-1999999 idmap config samba-ad:backend = rid
The next step is to create the smb.conf
to tell Samba that clustering should be used and all the configuration comes out of the registry.
[global] clustering = yes include = registry
![]() | Before you join the cluster make sure that you are using the domain controller as DNS-Server. Check the /etc/resolv.conf for the IP-address of your domain controller, on both nodes. |
Now import the configuration file into the registry with the command net conf import /daten/smb.conf.first
. You can test the import with net conf list
on both nodes. Test all settings with testparm
on both nodes. If you got no error message you can join the cluster into the domain:
root@cluster-01:/etc/ctdb# net ads join -U administrator Enter administrator's password: ******* Using short domain name -- EXAMPLE Joined 'CLUSTER' to dns domain 'example.net' Not doing automatic DNS update in a clustered setup.
root@cluster-01:~# net ads testjoin Join is OK
A DNS-update, during the join, for a cluster is not possible, that's why the DND-records were created before joining the cluster, but as you can see with net ads testjoin
the join is valid. Another test is to see if the account for the cluster was created in AD to do the test do a samba-tool computer list
on your domain controller.
Now you have joined the cluster to the domain. Up to this point you haven't started any of the Samba-services. Starting and stopping the services smbd
, nmbd
and winbind}
should be done by CTDB and not via systemd}. So before you you configure CTDB to take care of the services you have to stop and disable the services in systemd:
root@cluster-01:~# systemctl stop smbd nmbd winbind root@cluster-01:~# systemctl disable smbd nmbd winbind
root@cluster-02:~# systemctl stop smbd nmbd winbind root@cluster-02:~# systemctl disable smbd nmbd winbind
As you can see, you have to do it on both nodes.
Configure CTDB to take care of Samba-services
After you have done the configuration of Samba and you joined the cluster into your domain, you can start configuring of CTDB to take care of the Samba-services. Since Samba 4.9 the way to configure CTDB has changed a lot, up to this point you had only noticed this while you where configuring the recovery lock. The old way to configure the services was activating the services in the ctdb.conf
. Starting with Samba 4.9 that has changed, now you have a command to activated the services.
Every service will be enabled via an event script. To enable the scripts you use the command ctdb
. We need the Samba-script and the winbind-script. In the next listing you see the commands to enable the scripts:
root@cluster-01:~# ctdb event script enable legacy 50.samba root@cluster-01:~# ctdb event script enable legacy 49.winbind
root@cluster-02:~# ctdb event script enable legacy 50.samba root@cluster-02:~# ctdb event script enable legacy 49.winbind
After enabling both -- 50.samba
and 49.winbind
-- restart CTDB with systemctl restart ctdb
on both nodes.
After a while both CTDB-nodes becoming Ok again. Checking with ps | egrep 'mbd|winbind'
and ss -tlpn
you will see, that the Samba-services are running. Now you have configured CTDB to take care of starting and stopping all Samba-services.
What will happen when you execute the command? All event scripts are located in /usr/share/ctdb/events/legacy
if you activate a script, the script will be linked to /etc/ctdb/events/legacy/
. If you take a look at the script, you will see, it looks like a init-script of a service -- that's all it is.
Checking the CTDB-cluster
After your CTDB-cluster is running and the Samba-services are active let's take a look at some CTDB-tests.
ctdb status
Shows the actual status of the whole cluster. All nodes are listed and you can see the status of all nodes.
![]() | If you restart one or more nodes do a \texttt{watch ctdb status} so you will see all two seconds a refreshed list of the status. |
root@cluster-02:~# ctdb status Number of nodes:2 pnn:0 192.168.57.42 OK pnn:1 192.168.57.43 OK (THIS NODE) Generation:1823539022 Size:2 hash:0 lmaster:0 hash:1 lmaster:1 Recovery mode:NORMAL (0) Recovery master:1
What do you see here:
- The number of nodes in the cluster.
- A list of all nodes \textit{pnn}-number the IP-address and the status.
- Generation is just a number which changes if a reconfiguration is taking place. There is no special meaning.
- The size of the cluster -- here 2 nodes.
- The line
hash:<n> lmaster:<n>
is used to calculate the lmaster}, it's calculated via a hash value. - The Recovery mode shows if everything in the cluster is good
NORMAL
, or a recovery is taking placeRECOVERY
. - The Recovery master is the node which is responsible for a recovery.
If you just want to see the status of all nodes or just one node, or some hosts, you can use ctdb nodestatus
. With nodestatus you can see the status of the actual node or a list of hosts:
root@cluster-01:~# ctdb nodestatus pnn:0 192.168.57.42 OK (THIS NODE)
root@cluster-01:~# ctdb nodestatus 1 pnn:1 192.168.57.43 OK
root@cluster-01:~# ctdb nodestatus 0,1 pnn:0 192.168.57.42 OK (THIS NODE) pnn:1 192.168.57.43 OK
root@cluster-01:~# ctdb nodestatus all Number of nodes:2 pnn:0 192.168.57.42 OK (THIS NODE) pnn:1 192.168.57.43 OK
There are some more possibilities to check the cluster:
root@cluster-01:~# ctdb uptime Current time of node 0 : Wed Feb 12 18:56:14 2020 Ctdbd start time : (000 04:53:49) Wed Feb 12 14:02:25 2020 Time of last recovery/failover: (000 04:53:43) Wed Feb 12 14:02:31 2020 Duration of last recovery/failover: 0.573115 seconds
root@cluster-01:~# ctdb listnodes 192.168.57.42 192.168.57.43
root@cluster-01:~# ctdb ping response from 0 time=0.000124 sec (20 clients)
root@cluster-01:~# ctdb ip Public IPs on node 0 192.168.56.101 1 192.168.56.102 0
For more information about testing a CTDB-cluster see the manpage of ctdb
.
Check all services
The next important check is to see which services can CTDB provide and which services are actually configured. You could use the command ctdb scriptstatus
to list all the running services, but this command is deprecated. Since Samba 4.9 you should check the event scripts with event status legacy monitor
. Then you will see all the running (monitored) services. To see the list of all services do a ctdb event script list legacy
.
root@cluster-01:~# ctdb event status legacy monitor 00.ctdb OK 0.005 Thu Feb 13 18:28:35 2020 01.reclock OK 0.023 Thu Feb 13 18:28:35 2020 05.system OK 0.017 Thu Feb 13 18:28:35 2020 10.interface OK 0.019 Thu Feb 13 18:28:35 2020 49.winbind OK 0.011 Thu Feb 13 18:28:35 2020 50.samba OK 0.101 Thu Feb 13 18:28:35 2020
root@cluster-01:~# ctdb event script list legacy * 00.ctdb * 01.reclock * 05.system 06.nfs * 10.interface 11.natgw 11.routing 13.per_ip_routing 20.multipathd 31.clamd 40.vsftpd 41.httpd * 49.winbind * 50.samba 60.nfs 70.iscsi 91.lvs
The first command is showing all running services. The second command is showing all services CTDB can provide, all running services are marked with a *.