Back up and Restoring a Samba AD DC

From SambaWiki
Revision as of 09:39, 17 March 2021 by Hortimech (talk | contribs) (/* Removed portion referring to a fixed problem.)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)



Overview


Samba backups provide a way to recover your Samba AD domain, in the unlikely event that it suffers a catastrophic failure. It does not backup individual DCs.

If you are also using a DC as a fileserver (not recommended), you will also need to create separate backups of this data.

You only need to backup the domain data on one DC, but you can backup the domain data on all DC's for redundancy purposes, just remember that you will only use one of the backups to recreate your domain.

There are a few different flavours of backup, which work in different ways and achieve different things:

  • Online. This takes a clone of a running DC's database. It's similar in functionality to joining a new DC to the network.
  • Offline (or 'local'). This backs up the Samba files as they appear on disk. This includes replication meta-data that's local to this particular DC and which is not included in online backups. It can also create a backup file when the DC is offline (i.e. the samba process is not actually running).
  • Rename. This produces a backup file with the domain renamed, and is intended as only a temporary replacement.

These different types of backup have several things in common. The backups are all produced using a variant of the 'samba-tool domain backup' command. Each command creates a .tar.bz2 backup-file which contains an entire backup of the domain (based on a given DC). The backup-file can then be used to restore the domain using the 'samba-tool domain backup restore' command.




Recovering a single DC

The AD domain database is distributed across the DCs. There is one overall domain database containing all your user accounts and each DC has its own local copy. Replication is periodically happening in the background to keep each DC's database in sync. When you run the 'samba-tool domain join dc' command on an existing DC, the DC's local database gets overwritten with a new copy of the domain database. This is called rejoining a DC.

Sometimes you may end up with a DC that is not operating correctly when the rest of the domain is working fine. This may be due to a local database corruption, or a software bug may mean database objects were not replicated to the DC correctly. Each DC also has some local database information that does not get replicated, which could also be the cause of the problem.

In such a case, if you still have a working domain, then you can recover any DC by simply rejoining it to the domain. It will replace its bad local copy of the database with a new copy that should hopefully be free from errors. You should never use the 'samba-tool domain backup/restore' commands to recover an individual DC.


An example use case of restoring a domain

Let's say an admin inadvertently modifies or deletes an object in the database that breaks an AD service. This database change gets replicated out to all DCs, so now the service is broken across the whole domain. You can't just replace a DC to fix the problem, because the new DC will just end up with another distributed copy of the same broken database. If you don't know which one of the thousands of database objects is incorrect, the simplest solution might be to roll back to a known working copy of the domain database. Luckily you make regular backups.

This is where the 'samba-tool domain backup restore' command comes in. You can't have two separate copies of the same domain database, so first you need to stop all the DCs that are using the broken copy of the domain database. Next, you restore a new/repurposed DC with the backed-up 'good' copy of the domain database. Then you rejoin the DCs and they receive the good copy of the database as well.

Note that you have to rejoin all the DCs - you can't just restart samba on the old DCs. Restarting samba (rather than rejoining) will mean the DC still uses the old broken copy of the database. You will then have two different domain databases in use, DCs operating at cross-purposes, and the whole thing will be a complete mess.


So which backup should I use?

This is a difficult question to answer, as it somewhat involves predicting how your Samba domain will fail. It also depends on how you plan to react to the failure. By creating a backup, you may be trying to achieve one of several different objectives:

  • Provide temporary replacment DC(s), which will buy you time while you troubleshoot and fix the problems on your original Samba network.
  • Provide a long-term replacement for your domain, i.e. you plan to completely discard your failed domain and rollback to the last backup taken.
  • Provide forensics on why the domain failure occurred in the first place, e.g. if a database corruption occurred, what factors may have led to the problem?
  • Provide a 'good working state' database that can be used as a guide for repairing and recovering deleted or corrupted objects in the production database.

The rename backup works best for providing a temporary replacement domain, however, it is no good as a long-term replacement for the domain. Normally you can't run restored DCs and troubleshoot the failed domain DCs at the same time, because both will be using the same domain name (which will cause even more wild and weird problems to appear in your network). However, because the renamed backup is using a different domain name, it means you can safely run both the restored domain and the original domain in parallel.

The online backup works best for providing a long-term replacement for your domain. It can be used as a temporary replacement, but you cannot easily troubleshoot the failed domain DCs at the same time. Investigating the reason behind the failure of a Samba domain could be time-consuming and technically complex work. If, in the event of a catastrophic domain failure, you don't intend to investigate the problem and just want to revert back to a older copy of a working domain, then this is the best option for you.

The offline backup works best for forensics purposes, as it contains additional data that is not normally replicated. It can also be quicker to run an offline backup for a large domain, as you don't have the extra overhead of sending the database over the network and re-writing it to disk. However, note that the offline backup-file contains all the sensitive/secret information for your entire domain, so you still may not be able to share this backed-up information easily with Samba developers or mailing-lists. Also note that because the backup is copying the actual database on disk, there is potentially more chance that a corruption in the database is copied through to the backup.

If you want to create multiple different types of backups, note that the details of the backup are saved in a 'backup.txt' file within the backup-file itself.



Testing the backup restoration

The often overlooked step in any disaster recovery plan is testing that your recovery system actually works before you need to put it into practice. Creating the backup-file is not much good if samba fails to start up on your restored DC.

Note that the backup/restore commands are new to Samba v4.9, and there may be something specific to your network which means they don't work as well as intended. It's better to find any problems now (and report them to the Samba mailing-list), rather than in an emergency.

Unfortunately, testing a Samba backup is hard to do. Because the online and offline backups use the exact same domain information as your production network, it is not safe to run samba on a restored backup - the restored DC will interfere with your production network traffic.

Here are a few options to help you gain confidence that your Samba backup will actually be useful.

  • If you're generating a 'rename' backup, then it's always safe to restore the backup-file onto a new DC and start samba on it. So you can always sanity-check that your rename backup works successfully. A rename backup is quite similar to an online backup, so if your rename backup works, then you can have some confidence that your online backup would probably work too.
  • It's always safe to restore the backup-file to disk, as long as you don't actually start samba on the restored database. You could then use tools like ldbsearch (by specifying the local filepath of the database on disk) to sanity-check database objects were backed up correctly.
  • If you just want to get a better feel for what the various backup options do, one option is to create a samba lab-domain, and then create and restore online/offline backups within the lab domain itself.
  • Another option is configure your network switches to completely isolate the restored DC from the rest of your Samba network, although in practice this proves quite tricky to get right.

Note that the backup tool creates a 'warts and all' copy of the domain database. If you don't run samba-tool dbcheck over your Samba database regularly, then it's worth getting into the habit, especially before you generate a new backup. Otherwise, any problems in your database will persist right through a backup and restore of your domain. Dbcheck will report any obvious problems that exist in your database, and provides a '--fix' option that will resolve the problems for you.



Online DC backup

To create an online backup, use:

sudo samba-tool domain backup online --targetdir=<output-dir> --server=<DC-server> -UAdministrator

This command can be run locally on the DC or remotely on another machine. If running the command remotely, you may want to specify a --configfile option so that the correct smb.conf settings get included in the backup (i.e. the local smb.conf file may not exist, or its settings may be different to your domain DCs).


Offline/local DC backup

To create a offline backup, login on the DC you're backing up, and simply specify the target-directory location to write the backup-file to. E.g.

sudo samba-tool domain backup offline --targetdir=<output-dir>

Note that despite this option's name, the DC does not actually need to be offline when running this command. The tool is simply backing up the local files and it has sufficient locking in place to ensure the backup is generated safely.

Note that while the other backup commands are available from Samba v4.9 onwards, the offline command is not included until Samba v4.10.


Domain rename backup

For more details on creating a rename backup, see Domain rename tool.



Untarring backups

If you simply want to use the backup for forensic purposes (i.e. interrogating the details of specific database objects in a 'good' state), then it's safe to just untar the backup-file and query the database directly on disk. Note that you cannot run samba on an untarred backup - you must use the restore command to do this.



Restoring the domain

In the event of a catastrophic domain failure, to restore the domain from backup-file you would do the following:

  1. Stop samba on all the old DCs. (Unless you're using a renamed backup, in which case you can skip this step).
  2. Run the 'samba-tool domain backup restore' command to restore the domain database on a single new DC. See below for more details.
  3. Start samba on the new DC.
  4. Re-add the old DCs back to the network by joining them to the restored DC, e.g.
    samba-tool domain join <dns-realm> DC --server=<restored-dc>
  5. If you're using a renamed backup, you would then need to re-configure your network appliances so that traffic is redirected to the restored domain, instead of the failed/original domain.




Restoring the backup-file

The step to restore a backup is similar to the 'domain provision' you did back when you first setup your Samba network, except this time the backup contains all the database objects you've added since then. Similar to doing a provision, you need to specify a new DC when you run the restore command. This new DC must not have existed previously in the Samba network. The restore command will look something like:

sudo samba-tool domain backup restore --backup-file=<tar-file> --newservername=<DC-name> --targetdir=<new-samba-dir>

Note that the target-directory specified must be empty (or non-existent). This means it's not practical to restore the domain database back into the default installation location (e.g. /usr/local/samba). Instead, we recommend that you restore the domain database into a different targetdir, and then use the '-s' (or '--configfile') option when running samba, e.g.

samba -s <targetdir>/etc/smb.conf

Specifying the restored smb.conf will mean that Samba will use the database files in the correct location.

The restored DC will be added to the 'Default-First-Site-Name' site. This site will be created in the restored DB if it does not already exist. You can specify an alternative site to add the restored DC to using the --site option.

Before starting samba on the restored DC, you should double-check the restored smb.conf settings are correct. It may also be helpful to run samba_dnsupdate (although this still gets run automatically when you start samba).



Recommended strategy

Restoring the backup-file has a couple of minor annoyances:

  • Having to use a different directory to the default installation location.
  • Having to specify a different DC server-name to what was previously in your network.

The simple way to minimize these annoyances is to use a temporary server (or VM) for your restored DC. i.e.

  • Restore the backup-file onto the temporary DC and start Samba.
  • Rejoin the original DCs to the temporary DC one at a time. You can re-use the same server-name and default installation location during the join.
  • Once all the original DCs have joined the restored domain, you can remove the temporary DC (i.e. using 'samba-tool domain demote'). Your network of DCs should now be exactly the same as it was previously (except hopefully now in a valid working state!).



Troubleshooting

Note that if the backup or restore commands unexpectedly throw an error, they may leave behind a temporary directory in the --targetdir you specified. These may help to provide clues as to why the failure occurred. If you're running the restore command, then you will need to clean these up manually before re-running the command.


Creating Backups

  • Note that you should run the backup as root. Online backups can actually succeed as a non-root user, but it will cause you headaches later when you try to restore it.
  • For 'online' or 'rename' backups, sanity-check that the credentials and server details you're using are correct. E.g. try:
ldbsearch -H ldap://<server> -UAdministrator
  • Try increasing the debug-level to see if that tells you more about the failure. E.g. add the --debug=3 option to the command.
  • The 'online' and 'rename' commands work in a very similar way to a joining a DC. If joining a DC is known to fail on your network, then these commands are unlikely to work either. If you see the 'Committing SAM database' and 'Cloned domain <domain>' messages, then the join-like part of the backup has likely succeeded.
  • The backup tools do not work right hand against a Windows DC (mostly just backing up the sysvol files fails due to a lock from the DFSR service). If you have a mixed DC domain, then backup a Samba DC rather than a Windows DC. If you are in a native Windows domain, you can temporarily stop the DFSR "DFS Replication" service, on the DC you want to backup, for the duration of the backup.
  • An unusual corner-case is where you try to backup a brand new DC, before it's allocated a RID, which results in an error like below. Most users would be unlikely to ever see this, but if you do, the solution is to add a temporary user (i.e. samba-tool user create), just to force a RID allocation.
samba-tool domain backup offline --targetdir=/home/ubuntu/backup<ldb result>
ERROR(<type 'exceptions.IndexError'>): uncaught exception - list index out of range
  File "/usr/local/samba/lib/python2.7/site-packages/samba/netcmd/__init__.py", line 184, in _run
    return self.run(*args, **kwargs)
  File "/usr/local/samba/lib/python2.7/site-packages/samba/netcmd/domain_backup.py", line 941, in run
    sid = get_sid_for_restore(samdb)
  File "/usr/local/samba/lib/python2.7/site-packages/samba/netcmd/domain_backup.py", line 80, in get_sid_for_restore
    rid = int(res[0].get('rIDNextRID')[0])
  • Note that offline backups require the lmdb-utils package installed, otherwise it throws an exception trying to run mdb_copy(). E.g.
ERROR(<type 'exceptions.OSError'>): uncaught exception - [Errno 2] No such 
file or directory                                                                              
  File "bin/python/samba/netcmd/__init__.py", line 184, in _run                                                                                                           
    return self.run(*args, **kwargs)                                                                                                                                      
  File "bin/python/samba/netcmd/domain_backup.py", line 982, in run                                                                                                       
    self.backup_smb_dbs(paths.private_dir, samdb, lp, logger)                                                                                                             
  File "bin/python/samba/netcmd/domain_backup.py", line 907, in backup_smb_dbs
    copy_function(sam_file)
  File "bin/python/samba/netcmd/domain_backup.py", line 866, in 
offline_mdb_copy
    mdb_copy(path, path + self.backup_ext)
  File "bin/python/samba/mdb_util.py", line 35, in mdb_copy
    status = subprocess.check_call(mdb_copy_cmd, close_fds=True, shell=False)
  File "/usr/lib64/python2.7/subprocess.py", line 185, in check_call
    retcode = call(*popenargs, **kwargs)
  File "/usr/lib64/python2.7/subprocess.py", line 172, in call
    return Popen(*popenargs, **kwargs).wait()
  File "/usr/lib64/python2.7/subprocess.py", line 394, in __init__
    errread, errwrite)
  File "/usr/lib64/python2.7/subprocess.py", line 1047, in _execute_child
    raise child_exception
  • Running the backup against a Windows DC generally fails during the backup of the sysvol share. You may see an error like the following:
INFO 2019-01-15 15:29:11,850 pid:208088 bin/python/samba/join.py #1554: Cloned domain WINDOWS2012R2 (SID S-1-5-21-886655096-618523297-2770022155)
ERROR(runtime): uncaught exception - (3221225539, 'A file cannot be opened because the share access flags are incompatible.')
  File "bin/python/samba/netcmd/__init__.py", line 184, in _run
    return self.run(*args, **kwargs)
  File "bin/python/samba/netcmd/domain_backup.py", line 264, in run
    backup_online(smb_conn, sysvol_tar, remote_sam.get_domain_sid())
  File "bin/python/samba/ntacls.py", line 512, in backup_online
    ntacl_sddl_str = smb_helper.get_acl(r_name, as_sddl=True)
  File "bin/python/samba/ntacls.py", line 334, in get_acl
    smb_path, SECURITY_SECINFO_FLAGS, SECURITY_SEC_FLAGS)

The problem seems to be accessing the DfsrPrivate\Deleted and DfsrPrivate\Installing folders. It seem to be caused by a lock from the DFSR.

To avoid this error, before the backup, stop the DFSR "DFS Replication" service on the DC you want to backup. Don't forget to restart the service once the backup is completed.



Restoring backups

  • The restore command needs to be run as root. Part of the backup process involves preserving and restoring the NTACLs of the sysvol files, and some of the file operations involved require root permissions. If you're not root, then you'll probably see an error like:
ERROR(<type 'exceptions.TypeError'>): uncaught exception - (1, 'Operation not permitted')
  File "bin/python/samba/netcmd/__init__.py", line 177, in _run
    return self.run(*args, **kwargs)
  File "bin/python/samba/netcmd/domain_backup.py", line 520, in run
    backup_restore(sysvol_tar, dest_sysvol_dir, samdb, smbconf)
  File "bin/python/samba/ntacls.py", line 589, in backup_restore
    ntacls_helper.setntacl(dst, ntacl_sddl_str)
  File "bin/python/samba/ntacls.py", line 445, in setntacl
    return setntacl(self.lp, path, ntacl_sd, self.dom_sid)
  File "bin/python/samba/ntacls.py", line 214, in setntacl
    ndr_pack(ntacl))

OR

PANIC (pid 26958): Security context active token stack 
  • The --newservername you specify for the restored DC must not already exist in the original domain. Otherwise you'll see an error like:
Adding CN=PROD-DC,OU=Domain Controllers,DC=lab,DC=example,DC=com
ERROR(ldb): uncaught exception - Entry CN=PROD-DC,OU=Domain Controllers,DC=lab,DC=example,DC=com already exists
  File "bin/python/samba/netcmd/__init__.py", line 177, in _run
    return self.run(*args, **kwargs)
  File "bin/python/samba/netcmd/domain_backup.py", line 439, in run
    ctx.join_add_objects(specified_sid=dom_sid(sid))
  File "bin/python/samba/join.py", line 625, in join_add_objects
    ctx.samdb.add(rec, controls=controls)
  • If the backup command was run locally on the DC, then the backup-file should contain the DC's smb.conf. However, the smb.conf in the backup-file may contain 'interfaces' configuration that doesn't match the IP addresses on the DC you're restoring on. If this happens, you'll get an error like:
Looking up IPv4 addresses
WARNING: no network interfaces found
No IPv4 address will be assigned
Looking up IPv6 addresses
WARNING: no network interfaces found
No IPv6 address will be assigned
ERROR: Please specify a host-ip for the new server
You can avoid this problem by specifying a --host-ip argument during the restore. This should only affect rename backups.

Starting samba

If you ran the backup as a non-root user, the restore will still succeed, but you'll hit errors when you try to start samba. E.g.

sudo samba -s /tmp/online4/etc/smb.conf -i -d 3
samba version 4.10.0pre1-DEVELOPERBUILD started.
Copyright Andrew Tridgell and the Samba Team 1992-2018
...
directory_create_or_exist_strict: invalid ownership on directory /tmp/restore/private/msg.sock
exit_daemon: STATUS=daemon failed to start: Samba failed to setup parent messaging, error code -1073741801
You could chown your way out of this, but it's a lot simpler just to run the backup command as root.

FAQ

The bind-dns/ folder is empty. Why are no DNS Records saved?

The backup is a backup of the domain, but by default the restore will be configured for 'internal' DNS. Just use samba_upgradedns to change to DLZ_BIND9 if desired.

The private/tls folder is empty. Why aren't my self-created and samba-created certs backed up?

This is not a backup of a single DC, but of the replicated data in the domain. As such, per-server information is not backed up, and will need to be re-generated.

What user should I run backups as?

For online backups, you should be able to run as any AD user with Administrator privileges. Offline backups and restores involve reading/writing to the local file system, so would need root privileges.

However, as the backup-file created will contain all the domain's AD details, it's probably safest to always run the backup command as root, in order to restrict who can access the file.