Samba4/DRS TODO List
Update to new doc release
We should look through the new WSPP docs release (from August 2009) and see what we haven't implemented yet, forming a more extensive todo list then this one. Now that we have basic replication working we can start to try to get all the corner cases right, and for that the docs (especially MS-DRSR and MS-ADTS) are a good source of information.
Why isn't repsTo written by Windows?
I have noticed that Windows is not sending us a DsUpdateRefs to update the repsTo when we join a Windows domain as a 2nd DC. This means if we followed the correct behaviour we would never send Windows a DsReplicaSync message, so we'd never tell windows to replication to us.
To work around this dreplsrv_notify_check() currently cheats by using repsFrom if repsTo is empty. We need to instead work out why Windows is not sending us DsUpdateRefs messages. Perhaps related to the lack of nTDSConnection objects?
Update: Discussions with the Microsoft AD team indicate that this is probably caused by delays in the Windows DC adding the Samba DC as a replication partner. The problem resolves itself after about 30 minutes or so. Sometimes running "repadmin /kcc" on the Windows DC helps.
The repadmin.exe tool on windows is a great way of seeing the status of replication. We would like to get all of the options of repadmin working when directed at a Samba4 DC. Anatoliy is working on making some of the functions work, but there are plenty more to do.
If DRS replication adds a link to a object that doesn't exist we are supposed to create a "phantom" object, which gets filled in later. We are working around that at the moment by delaying link creation until then end of the transaction for the replica cycle, but we should also support phantom objects.
Speed up replmd_ldb_message_element_attid_sort
The replmd_ldb_message_element_attid_sort function is pretty inefficient. We need to avoid the attribute lookups in the sort comparison function.
Don't allow replication of readonly attribs
We should not allow replication to overwrite readonly attributes. There are other attribute flags we aren't honouring as well. We should check the docs and add support for all the relevant attribute flags.
A RODC (read-only domain controller) is a potentially very useful use case for Samba4. There is quite a lot of changes in replication and attribute filtering that should be done when we are a RODC.
- Modify the provision script to mimic the dcpromo to RODC operations
- Support for the RODC filtered attribute set
- Create the RODC default filtered attribute set:
ms-PKI-DPAPIMasterKeys ms-PKI-AccountCredentials ms-PKI-RoamingTimeStamp ms-FVE-KeyPackage ms-FVE-RecoveryGuid ms-FVE-RecoveryInformation ms-FVE-RecoveryPassword ms-FVE-VolumeGuid ms-TPM-OwnerInformation Howto: http://technet.microsoft.com/en-us/library/cc772331(WS.10).aspx
- Implement marking an attribute as confidential.
- If you try to add a system-critical attribute to the RODC filtered set while the schema master is running Windows Server 2008, the server returns an LDAP error "unwilling To Perform" (0x35).
- Mark as confidential any attributes that you configure as part of the RODC filtered attribute set.
- Support Administrator role separation - delegate the local administrator of an RODC to domain user or security group without granting that user or group any rights for the domain or other domain controllers.
- Unidirectional replication - allow only inbound replication
- Read-only database - LDAP clients that want to perform a write operation are referred to a writable domain controller in the hub site.
- Credential caching - By default, an RODC does not store account credentials, except for its own computer account and a special krbtgt account for that RODC. You must explicitly allow any other credentials to be cached on that RODC, including the appropriate user, computer, and service accounts, to allow the RODC to satisfy authentication and service ticket requests locally. Howto: http://technet.microsoft.com/en-us/library/cc754218(WS.10).aspx
Separate gc partition
Right now the gc partition is just an amalgamation of the normal base partitions, with no filtering (we just set the magic control to say that searches should cross partition boundaries).
We need to decide if we should make a separate ldb for the gc partition, and if so what method we will use to keep it in sync. If we don't create a separate partition then we should add the right filtering to gc searches.
If modify sets attrib to same value then no replPropertyMetaData change
A modify via DRS replication that asks for an attribute to change to the same value it already has should be filtered out by repl_meta_data.c so that the replPropertyMetaData attribute is not updated.
Fix error mapping (no FOOBAR, and replmd_replicated_request_werror)
We have lots of code that returns WERR_FOOBAR or NT_STATUS_FOOBAR because we didn't know what error to return. We need to go through these and either work out the correct error code, or if that is hard then at least put a reasonable guess of the right error code along with a TODO comment to check it.
Honor attribute replication flag
There is a attribute flag for whether particular attributes should be replicated. We need to check that we get this right.
Update: We are now honoring the replication flags, although not the GC filtering flags or the RODC filtering flags
Handle add where DN exists, but different GUID
We may need to handle the case where a DRS replication comes in for a DN that exists, but with a different GUID. We need to test with windows on how this is handled.
Net commands to query repl status (via DRS?)
We should add net commands for querying the replication status (somewhat like repadmin.exe).
Max number of attributes on objects?
Metze noticed that the WSPP docs specify a maximum object size in AD of around 8k. This seems to translate into a maximum number of attributes that windows accepts. We may need to implement a similar limit to prevent problems with replication s4->windows.
Obey acls on objects
We need to obey the ntSecurityDescriptor on objects in our SAM. This is a large task! Nadya is working on it and hopefully will merge soon.
Fix ldb_add ojbectclass sorting
In ldb_add we sort objectClass attributes in the objectclass module. The sort is currently horribly inefficient - it needs redoing using the sort indexes that Andrew and Nadia have recently added.
-s option to setup_dns.sh
The setup_dns.sh should be redone as a python wrapper so it obeys standard options like -s and can read smb.conf
s4<->s4 in make test
We should add the s4<->s4 vampire and replication in make test
We need to add the urgent bit on replications that have changed critical objects (see the docs for a list). We will probably need to expand @REPLCHANGED to add a uSNUrgent attribute to support this.
We are not currently obeying group policies, although we can serve them out to clients. We need to obey the ones that make sense for Samba. For this we need to provide a really easy API to allow any part of Samba to query a group policy, and to auto-update SAMDB with the needed changes.
Add support for ndr64 to wireshark
When watching w2k8-R2 <-> w2k8-R2 interactions, windows chooses NDR64 instead of NDR. We now support NDR64 in Samba, but wireshark doesn't understand it. To allow us to watch traffic between w2k8-R2 boxes we would like wireshark to understand NDR64.
Convert wireshark drsuapi to pidl
The DRSUAPI decoder in wireshark is quite poor. We should redo it using a pidl based parser.
Fix decryption of w2k8 by wireshark (krb5 patch)
When watching w2k8 <-> samba traffic in wireshark we often find that wireshark cannot decrypt some of the traffic. This is due to a bug/limitation in MIT kerberos. Metze has a hack based on LD_PRELOAD that works around this, but we should try to get this into the wireshark svn tree directly.
How does another DC become the FSMO master and RID master?
We need to work out how a DC should become the FSMO master and RID master. We can do it now via ldbedit, but there should be a more automated method (perhaps the KCC should do this?)
Add LDAP backend to BIND that uses AD's in-directory format
We need to serve DNS based on the data in Active Directory. There is an LDAP backend for BIND already, but there is no doubt a lot of work between that and using it against an AD-like database
Incorporate nssupdate-gss into the 'net vampire' command
We need to call nssupdate-gss from at the end of the 'net vampire' command, so that we don't need to manually run 'setup_dns.sh' after a vampire
Script and tests for takeover of FSMO
We should have a script to (optionally forcibly) take over the FSMO roles of a domain
(currently this is done by a local modify to our LDB, but a script like the one to raise the functional level would be good)
DRS_GET_ANC: Sort the result so that it includes updates to ancestor objects before updates to their descendants.
DsGetNCChanges:Replication rights check
The IsGetNCChangesPermissionGranted procedure returns true if the source DC has permission to replicate objects and its attributes from the NC replica, as defined in msgIn.
DsGetNCChanges:Getting the changes based on the input UpToDateVector
We have input UDV with invocation_id, highest_usn for a neighbor. Return for each neighbor all changes above the specified USN.
Implement dirsync control for LDAP
Dirsync control is used in AD aggregation software like MIIS or IIFP, it used by the requester to ask for the list of changes since last interrogation for a given ID. This page present a little this control and it's use: http://support.microsoft.com/kb/891995
Handle conflicts in repl_meta_data
These need to be resolved via changetime and originating invocation ID, for both normal attributes and linked attributes
We need a testsuite for this, which should suspend replication, make conflicting changes, and then allow replication again.
Refuse some privileged controls over ldap
Some of our ldb controls which are intended for internal use only need to be refused over ldap, or at least refused without admin privileges. We need to review the list of controls we handle, and probably have a list of ones that are allowed for non-admin users.
Refuse schemaUpdateNow when a transaction is active
Lots of Samba4 code assumes that pointers into the schema are constant for the life of a transaction. We need to enforce not having an open transaction when we do a schemaUpdateNow
Give 'no such object' error when using a deleted object as a base DN
Currently, we just filter deleted objects from the reply set (by adding to the search filter), but do not give the additional error code when a deleted object is used as a base DN
Add nested transactions to ldb, using ldb_tdb
TDB needs to be extended to have fully nested transactions, and ldb needs to be modified to pass transaction nesting down to the tdb layer (which now knows a little more about possible safe nested transactions)
Handle protected objects in delete
Find out what 'protected objects' are, and figure out how to implement them
In particular, updating well known GUIDs when renaming a well known objects
determine if we can rename an object that is pointed to by a well known object, if the well known GUID update is manual or automatic, and how to update them.
Fix dcdiag.exe errors
dcdiag reports quite a few errors against a Samba DC, including missing attributes, missing RPC calls and other failures.
You can see some sample output here: http://samba.org/tridge/dcdiag.txt
Don't put NC attributes in RID Manager$
Our replication code treats replication of "RID Manager$" like all other NCs, which means it adds a replUpToDateVector and repsFrom attributes. These should only be put on real NCs, not the "RID Manager$" object.
Fix finddcs_send in libnet_lookup.c to use DNS
It uses NBT, then falls back to a bogus DNS lookup of the netbios domain name without the right suffix. It should start with DNS and only try NBT if that fails.
Removing a DC from the Domain Controllers container when using windows user/group admin tool against a s4 DC fails with "bad stub data". It generated a fault on the wire.
Join w2k8 to samba4 dc
We've been concentrating up to now on Samba4<->Samba4 replication, and Samba4<->Windows replication where the Samba4 server joins the Windows domain. A more difficult problem is making it work when you start with a Samba4 domain (from provision, or from vampiring a Windows domain) and then try to add another Windows DC by using dcpromo. This is currently failing with an obscure error at the end of the dcpromo process.
Update: We finally achieved this on 25th September. Currently the changes needed are in the plugfest branch (see http://git.samba.org/?p=tridge/samba.git;a=shortlog;h=refs/heads/plugfest) but we expect to move them to master after we have cleaned up the binary DN handling.
Update2: This is now in master.
Create connection object (nTDSConnection)
Our KCC implementation (in source4/dsdb/kcc) is very simple at the moment. It should work by creating nTDSConnection objects under the nTDSDSA objects in the LDAP tree, then use those to create the repsFrom attributes, and possibly send DsUpdateRefs operations to the other DCs to setup a repsTo on each replication partner.
Right now we don't create nTDSConnection objects at all, which needs to be fixed.
Hook delete in repl_meta_data
Right now we just pass delete operations down through the repl_meta_data module to the ldb_tdb backend. That means that deletes are not replicated (as they don't change anything in ReplPropertyMetaData or in the uSNChanged attribute).
We should intercept delete operations and translate them into a combination of a rename to a objected in the "Deleted Objects" tree, along with a modify to add the isDeleted attribute. Then we need to setup the tombstone data in the object, and add a tombstone reaping task that would run once a day to really delete expired tombstone records.
Sort objects on disk
Some sysadmins might write scripts that rely on the return order of attributes within objects (eg. objectclass first). We sort objects on add in repl_meta_data.c to cope with this but we don't fix the sorting on modify. That should be fixed.
But note that despite the appearance that attributes are sorted by attributeID, generated attributes appear last, regardless.
Microsoft has told us that in their opinion we don't have to do this.
Check for parent exists in replication add and rename
During replication add and rename we need to check that the destination parent exists.
What triggers initial kcc run on windows after we join a w2k8 DC?
After we join a s4 DC to a windows domain, we've noticed that w2k8 needs to be prompted to run its KCC using "repadmin /kcc". We need to work out why this is needed so we can fix it.
Update: see the comments on repsTo update above
We currently accept the w2k8 linked attributes in replication, but when other DCs replicate to us we serve up linked attributes as normal attributes (which is like a downlevel w2k3 does). We should store the full meta data associated with linked attributes in more fields in the extended DN and serve it up in getncchanges.
bitmap32 actually 3264 in samr QueryUserInfo level 16? (netmon bug too)
There seems to be a problem with the QueryUserInfo level 16 and NDR64. The Microsoft netmon 3.3 parser has the same problem as our ndrdump parser. We need to look into how this should be handled.
Update: this was fixed by the addition of NDR64 union alignment.
Implement RID Master and RID pools
We need to implement the RID master and allocate RIDs out from the RID pool
Single module stack
Samba4 is mostly run as a domain controller, but we have the option to run it standalone or as a member server. This codepath and module stack is largely untested.
The goal here is to run with the repl_meta_data module for all (non-LDAP) configurations.
Blockers: Need an invocationID for repl_meta_data to place into the replMetaData record. Currently we don't have one because we are not a DC, and don't have an CN=NTDS Settings record
With the changes being made to repl_meta_data, we now store extra metadata in the extended DN. This information needs to be transferred between the old and new DN values in a source link, not discarded
Filter on Up-to-dateness vector
We should filter not only on the usnChanged, but also on the up to dateness vector supplied by the replication partner
We store parentGUID in the object on disk at the moment, whereas we should construct it at runtime when asked for. +
Double cn fix
When we do a s4<->s4 vampire we end up with the rDN attribute appearing twice on all objects in the new replica. We think this is because we should be filtering the rDN in the getncchanges code, but this needs checking.
Update: this should be fixed by the latest commits from the plugfest
Add interim DNS zone generator based on sam.ldb
We should have provision generate a zone file based on all the listed domain controllers in sam.ldb, not just a single DC. It could look up the other DCs with DNS to find their IP, and use the specified IP for the new server.
This will help us handle the DNS reproducibly while we wait for the above item.
Update: added as scripting/devel/rebuild_zone.sh