Dbcheck

From SambaWiki
Revision as of 01:42, 24 July 2024 by Dbagnall (talk | contribs) (→‎What it does: Add a detailed breakdown of dbcheck default behaviour)

The samba-tool dbcheck utility enables you to detect and fix problems in the Samba AD database.

You must run the check and fix command on every Samba AD DC locally, because some fixes apply to non-replicated attributes and modifications are not replicated to other DCs. The tool cannot be run over LDAP.

To check the AD database, run:

# samba-tool dbcheck --cross-ncs

The --cross-ncs option checks all AD partitions (naming contexts). Without this option, the tool only checks the main domain partition.

To fix reported errors, run:

# samba-tool dbcheck --cross-ncs --fix

You will get prompted whether or not you want to fix each individual error. Choosing 'all' will fix all errors that are the exact same type of problem.

If you pass the --yes parameter to the command, all questions are automatically answered with yes. Note that if you omit the --yes parameter, the database check executes three fsync() calls for each object. This can result in a longer run duration. For example, passing the --yes parameter to the command fixed 3500 objects in 10 seconds in our test environment. Without this parameter, the command required 4:50 minutes for the same operation.

After a repair, re-check the database to verify a successful operation.

What it does

The tool goes through every object in the database and checks it for consistency. There are a number of rules that it looks for, which are problems that have known to occur in the past with Samba. For example, linked attributes are usually bi-directional, so if dbcheck finds a two-way link that's missing a back-link, then that's a problem.

The tool can take some time to run, depending on the size of your database. Not only is it checking every user and every group object, it's checking that every member in the group points to a valid user object, and that user has a matching return link.

Some database problems will propagate throughout the domain DCs, via DRS replication. Other problems might be localized to a specific DC, as they may affect a non-replicated attribute, or how the database contents are stored on disk.

What it does in detail

1. Check deleted objects containers.

For each naming context that should have a "well known object" magic DN for a deleted objects container, but doesn't:

  1. check for an existing object with the deleted objects container DN (i.e. it is masquerading but not functioning as the DOC). If it exists, rename it as a conflict record.
  2. check whether the naming context has a wellKnownObjects attribute pointing to the deleted objects container. If there is, record the object GUID.
  3. create a new deleted objects container. If there was one listed in the wellKnownObjects, ensure the new container has the objectGUID it refers to.

2. Check every object

For every DN, check that object's attributes for consistency. If samba-tool dbcheck is run with, say, --attrs='systemflags isdeleted', only those attributes will be checked. The default is to check them all. Not all problems can be fixed. The fact that a problem is checked doesn't mean it has ever occurred in the wild.

The following problems are noticed by default:

  1. the object has changed after it was deleted.
  2. the object doesn't have exactly one "name" attribute.
  3. the object doesn't have exactly one rdn attribute (e.g. "cn" for "cn=foo,DC=bar")
  4. there is a replPropertyMetaData attribute containing a zero-guid invocationId.
  5. replPropertyMetaData mentions attributeIDs outside the schema.
  6. replPropertyMetaData has duplicate attributeIds
  7. replPropertyMetaData has incorrect attributeIds
  8. replPropertyMetaData has badly sorted attributeIds
  9. replPropertyMetaData has duplicate attributeIds
  10. replPropertyMetaData has a bad initial value (should be 0)
  11. deleted objects container has wrong ACL (--reset-well-known-acls to fix others)
  12. ntsecuritydescriptor ACE inherited types are inconsistent
  13. ntsecuritydescriptor lacks owner or group SIDs
  14. objectClass sorting is incorrect
  15. userParameters is bad (due to an early Samba bug or bad replication)
  16. duplicate governsId or attributeID (incautious messing with schema)
  17. empty attributes
  18. unknown attributes
  19. for attributes that contain DNs:
    1. if a linked attribute, check for duplicate links
    2. check for orphaned back-links (i.e. missing forward links)
    3. check for missing back-links
    4. check for missing GUID component
    5. check the GUID is correct
    6. missing link targets (the object is not there)
    7. binary DN is incorrect
    8. linked attributes on a tombstoned object
    9. target is a deleted object
    10. dn string has changed (cosmetic only, as GUID resolves link)
    11. dn lacks a SID component
  20. more checks around the intersection of deletions and links
  21. for non-DNs, check for string normalisation
  22. for non-DNs, check for duplicates
  23. check for wrong instanceType
  24. missing name value
  25. name does not match dn
  26. if it is deleted objects container, check timestamp
  27. fix replPropertyMetData if there's a mismatch
  28. check FSMO status is consistent
  29. check the parent object exists for this object's DN
  30. if it is a deleted object container, check multiple attributes
  31. if it is a dnsPartition, check the repsfrom locations (with different behaviour for an RODC)
  32. if a serverReference object, check RIDs
    1. if RID master, check more
  33. if a rIDSetReferences object, check RID attrs (ranges, conflicts with SIDs, etc)


3. Check the RootDSE

Mainly this checks the dsServiceName attribute looks OK.

Why run it

Imagine an older version of Samba contains a bug that sometimes inserts an incorrect database record. Even after you upgrade Samba to software that has the problem fixed, the incorrect records are still going to remain in your database.

Sometimes a problematic database record may result in an obvious error. However, other times it could result in a more subtle problem. For example, if a user was missing a 'memberOf' backlink, then that user would still appear as a member of the group (i.e. checking the 'member' forward link works correctly), but the group permissions (which check 'memberOf') might not be applied correctly when the user logs in.

As database inconsistencies have the potential to accumulate over time, it is worth running dbcheck regularly.

Precautions

The dbcheck tool is modifying your database in order to remove corner-case problems. There is a small risk that the act of fixing the problem could itself have a unforeseen and negative side-effects.

Note that just running the dbcheck reporting on its own (without --fix) is completely safe.

It is worth taking a domain backup before using the --fix option (and it's probably best to take both an online and offline backup).

If fixing dbcheck errors seems to have introduced problems to your DC, you could try the following:

  1. Use samba-tool drs replicate --full-sync --sync-forced for each of your partitions. This means the DC receives the domain database in its entirety from another DC. This should force the local DC to overwrite its database with the updated contents.
  2. If that didn't help, try re-joining the DC, which will completely rebuild the DC's local database.
  3. Finally, as a last resort you could try reverting back to the domain backup-file. Restoring the domain backup-file involves rejoining all the DCs from scratch, so you would only resort to this if all the DCs in the domain had problems.