Running Samba AD Domain Controllers in large domains

From SambaWiki

DRS replication (and joining a DC)

The time it takes to DRS replicate is proportional to the size of the database and is one of the longest running operations one may run against a domain controller. Not only does returning all the data across the network in the correct format take time, reformatting the responses and writing the results to disk also takes significant amounts of time. Simultaneously joining two domain controllers is a serial operation (one RPC process), and does not speed up the time it takes to prepare domain controllers.

As replication generally triggers a number of writes, it is recommended that fastest storage possible is used. Even in the case where no meaningful changes will be written (full synchronization on a synchronized database), faster storage has a notable effect on the overall synchronization time.

RID allocation

Due to longer replication times, the internal queue of replications in the drepl_server process will be unable to be processed. This might have an effect on RID allocation which uses the same flow of replication operations. Avoid attempting a full synchronization while adding bulk users simultaneously, otherwise the DC may run out of RIDs to allocate from its pool. Eventually the RID pool should refresh, but in the meantime, operations that consume RIDs should be done against a different domain controller.

Queued replications

Following a full synchronization of a large database, the drepl_server process may have accumulated a large number of pending notifications and pull requests. It may take some time to flush these operations and so user triggered replications via samba-tool may not respond for a while. Using the --local option of samba-tool drs replicate is one way to avoid waiting, alternatively restarting the Samba process will flush the in-memory queue.

Linked attributes

Linked attributes like the member attribute for group membership, contribute a large portion of the overall synchronization time. Avoiding having too many links may reduce the time required to replicate a database.

LMDB map size errors

Linked attributes processing in Samba 4.9 caused bad (transaction) memory behaviour with LMDB during a join, triggering a map size error with a large number of links. Samba 4.10 should address these issues, but increasing the map size limits may also be a sufficient workaround in some smaller cases.

Running samba-tool dbcheck

Running the standard dbcheck on a large domain can take a very long time (on the order of days, when only checking the consistency and not fixing any issues). The most significant contributor to this time is linked attributes. Regardless of the size of the database, checking consistency rules is important.

The safest way to dbcheck a database (both to check for errors and to fix errors) is while the Samba processes are all offline, because some checks may be interfered with by modifications on a live server. Local database modifications may also interfere with the dbcheck, so you should make sure there is no other local accesses are being made. When running with --fix, --yes ensures that no other access to the database is possible with a transaction, and note that using this against a live server would be extremely unwise as it would disrupt normal operations for a long period of time.

Skipping checks associated with the member attribute

In versions of dbcheck > 4.11, there will be a new option to allow a quick check of member linked attributes. In a large domain, member attributes may be quite common and running the full list of checks consumes far too much time. Since Samba 4.7, a number of consistency issues associated with linked attributes should no longer be simple to trigger. This means that a noticeable number of the checks present in dbcheck are highly unlikely to find any issues, despite consuming a large amount of time.

Fixing a large number of dbcheck errors

As write operations can disrupt normal operations, it is possible to change the scope of what dbcheck inspects and restrict it to an LDAP subtree (base or one-level). This may even be used for single objects in the database so that you can generate a list of distinguished names and then subsequently run a fix for each of them (ideally the code should generate a unmodified list, but it does not currently have this capability). This method may also be used to restrict the scope of checking the consistency rules, and not applying fixes yet. You could generate a list of all distinguished names in the database and then trigger dbcheck on each to determine if there might be an issue, however note that discrepancies in the object list due to modifications and consistency fixes made externally during dbcheck checking (rather than fixing) may cause unexpected results.

Subtree rename

Performance issues may arise when attempting to rename an object (e.g. an OU) that contains other objects (i.e. their DNs end with its DN). When the object is renamed, the other objects under that DN get renamed too, as well as their children, meaning potentially thousands of objects can be renamed at once.

The slowest part of this is fixing linked attributes. When each object is renamed, all the links that point to it need to be changed too. If the domain makes use of groups, group memberships are likely to be the bulk of the linked attributes. Starting from 4.11, databases using the sorted links feature will use a binary search for the group member attributes, but before that a linear search was necessary (the sorted links feature is the default for new databases since 4.7). For user memberOf attributes and other links which are maintained in the opposite direction, a linear search is still used (but normally less work is required).

For the following analysis we make a few assumptions:

1. The number of groups grows monotonically with the number of users.

2. There will be some very large groups.

3. It is unlikely for a user to be in all the groups (but the converse might be true).

Version up to 4.10

This has pseudo-quadratic behaviour with the number of users in the domain. For each user that gets moved, all the groups that it is a member of need to be updated. It is likely that this includes some large groups which contain nearly all the users, and these are searched linearly.

For each group that is moved, the backlinks of all users must also be searched linearly, but this is less work because users tend to belong to only a few groups each.

Version 4.11+ with sorted links

For each user that gets moved, all the groups still need to be updated, but this can be done with a quicker binary search. With very large domains, the speed up can be many orders of magnitude. For each group that is used, the users are still searched linearly, but the search is stopped earlier if it is successful, making it twice as fast in practice.

wbinfo and other winbind related calls can take a long time (or doesn't work)

winbind resolution that enumerate users in domains with a lot of users takes a long time and often never resolves. There is a long chain of timeouts associated with these calls and many layers of indirection which means that often these calls will just freeze rather than return in a sensible period of time (probably with an error). There are two aspects of the time required for this resolution, the first is actually fetching all the required user information from the domain and the second is the time spent caching the entries (for the winbind daemon).

wbinfo

As the number of groups and users increase in the database, the time it takes to complete calls to wbinfo -g (groups) and wbinfo -u (users) will increase. Normally there should be far more users than groups, so wbinfo -u will be the call of most concern. Once these calls reach around 1 minute, they will start to fail and continue to fail. There is currently no workaround which uses winbind to retrieve this information, although if you can reduce the amount of users it could help. Consider retrieving this information through some other mechanism, e.g. via the SAMR pipe and associated enumeration RPC calls, or via an LDAP (or or local LDB) search query. The LDAP dirsync control (or the samba-tool user syncpasswords which is a user of this control) could also be used to maintain a constantly updating list of users which is reasonably close to the actual list at any point in time.

getent passwd and nsswitch user enumeration

When enabling 'winbind enum users' in the smb.conf on a Samba domain controller (or 'winbind enum groups', in the case of many groups), 'getent passwd' and other tools relying on nsswitch suffers from the same problems as wbinfo described in the previous section.

LDAP full scans (and internal scans)

As soon as the total size of the sam.ldb database starts to reach several gigabytes, the time taken to return a full retrieval of the database with default attributes might start taking a minute or more. These reads could be blocking writes and so may bog down the server (particularly DNS updates and logon success or failure accounting). If possible avoid triggering LDAP full scans of the entire database (or even just the domain partition), and consider restricting the visibility of objects and attributes for ordinary users.

DRS replication

A full scan currently exists in the DRSUAPI pipe of the RPC server. The replication call also has a maximum wait time of 10 seconds due to any searching which occurs, which can make non-NETLOGON RPC calls delayed by up to roughly this amount. Under heavy replication load, expect the RPC server to have higher latencies.

Tombstones expunge

A full scan currently exists in the periodic check for tombstoned objects and linked attributes. This scan should not take more than 10 or 20 seconds even with a database of several gigabytes, which may impede operations, but it should not be run very frequently in the background.

256MB total limit for data returned

Due to a memory allocation limit within Samba, any LDAP search is restricted to only returning less than 256 MB (roughly) of data. This is quite a lot of data, and if possible, restricting users from reading this much data from the database in one go by restricting visibility may be advisable. There is no way to configure this limit however if you wish to return more data than 256 MB, then you can use the paged results LDAP control. Alternatively, you may wish to reduce the amount of data to be returned by a single LDAP query, by filtering out attributes or changing default visibility of attributes, or by manually splitting data retrieval into more than one query e.g. one search for one half of the attributes, with another query for the other half, or dividing an OR search expression.

LDAP bind

When binding against a user belonging to a group (or recursively inherited group) with many users, the bind time may be noticeably increased (2-3x as long with groups with 20,000+ users). This is because the database needs to load the entire record and all the user entries. Work is in progress to try to improve this time.

Samba multi-process model

Under excessive load, the standard process model quickly consumes large amounts of memory and resources which often results in the out-of-memory killer taking out services. In Samba 4.9 and above, it is strongly recommended to use the prefork process model for starting the Samba DC. More about the prefork model can be found at https://wiki.samba.org/index.php/Samba_server_process_model. One of the advantages of the prefork model is that you can use the smb.conf option to change the amount of processes that will be used (and per service as well). On domain controllers with significant resources, this allows administrators to have one process per CPU and allow significant amounts of throughput and minimizes latencies.

The prefork process model will be the default in Samba 4.11, with four worker children per service.

Automatic restart of child processes and other new features

With Samba 4.10, there are a number of improvements so that errors affecting a single client connection do not affect the overall availability of the service. Refer to the Samba server process model wiki page and the associated smb.conf manpage documentation for the new parameters.