LDB Greater than and Less than indexing

From SambaWiki

Overview

Added in Samba version: 4.11

<What does the feature do, Why might users care about it, What is the Microsoft-equivalent that a Windows admin might be familiar with, etc>

How to configure it

For new installations, nothing is required (newly joined DC). The speed-up should just be present for schema attributes which have indexing enabled e.g. uSNChanged.

For old installations, a re-index is required (detailed below).

To enable indexing on attributes not indexed by default, you must add the index flag on the appropriate schema object in the database as per standard Active Directory.

Known issues and limitations

This only currently supports indexing of integers, both 32-bit and 64-bit stored in the database. Supporting string indexing is much more difficult when considering unicode charsets and potentially differing sort orders in different locales. Certain attributes might only be ASCII and so could be extended in the future, but there isn't anything planned so far.

Old AD databases using the new code

Active Directory domain controller databases using the older indexing scheme will need to be re-indexed using samba-tool dbcheck --reindex. Without re-indexing the database, queries which rely on integers stored in the index will no longer be returned.

This may have critical impacts on tools and services relying on these attributes!

Troubleshooting

An ldbdump (and additionally mdb_dump for LMDB) of the database might help to determine which indexing format is currently being used by your database.

Basic timings can be used to determine if the index is in effect and noticeably the search should be proportional to the actual data asked for. e.g. 1000 entries and >= should only return 100, will be much faster than without the indexing code. Without the indexing code, whether >= returns 900 records or 100 records, the timings should be similar. Additionally there is a secret option for disabling full scans, but this is mostly for developer usage.

For Developers

How to test it

make test TESTS="ldb samba4.ldap.index.python samba.tests.complex_expressions"

lib/ldb/tests/python/index.py and lib/ldb-samba/tests/index.py contain a similar test for indexing, but lib/ldb contains the testing for the new "Ordered Integer" 64-bit integer schema syntax, while lib/ldb-samba contains the testing for the Samba AD syntax for 32-bit integer syntax (which pre-dates this work). ORDERED_INTEGER was added to the proto-schema handling in @ATTRIBUTES in order to expose the syntax for the tests.

Where the code is located

Test files are referenced above, but the code lives in three different sections:

  • LDB key value layer
  • LDB indexing layer
  • Samba AD schema syntax injection

LDB key value layer

In order to implement range indexing, we needed to implement range queries on the underlying key value store. This only works in LMDB with (lexicographically) sorted keys. This works like the standard iterate function, but we supply a start and end key (inclusive). This layer is agnostic to all types of indexing formats, storage formats etc.

LDB indexing layer

At the LDB indexing layer, we need to create a GUID list to return to the overall indexing code which is in the same format as a single index fetch to the database (which reads out the packed GUIDs). In our case, we need to construct our GUID list artificially from multiple fetches to the database.

At this layer we need to define our start and end keys. In order to avoid defining a start and end key for the search, we notice that each index key is of the form: DN=@INDEX:<ATTRIBUTE>:<VALUE>\0. We can simply make our start key DN=@INDEX:<ATTRIBUTE>: and our end key DN=@INDEX:<ATTRIBUTE>; to return all index entries for a particular attribute.

But then, a canonicalization function which returns a lexicographically sorted key is required. Originally we wanted to use the existing canonicalization functions, but these appear to be used elsewhere in the code (for uses other than creating index keys). This is why we introduced the index_format_fn into the schema syntax to offload that work

Samba AD schema syntax injection

Future improvements

extended operation + no allocation

implementation of subtree using this technique... and one level indices...

Reference Docs

Nothing in particular.