Samba4/DRS TODO List
Join w2k8 to samba4 dc
We've been concentrating up to now on Samba4<->Samba4 replication, and Samba4<->Windows replication where the Samba4 server joins the Windows domain. A more difficult problem is making it work when you start with a Samba4 domain (from provision, or from vampiring a Windows domain) and then try to add another Windows DC by using dcpromo. This is currently failing with an obscure error at the end of the dcpromo process.
Update: We finally achieved this on 25th September. Currently the changes needed are in the plugfest branch (see http://git.samba.org/?p=tridge/samba.git;a=shortlog;h=refs/heads/plugfest) but we expect to move them to master after we have cleaned up the binary DN handling.
Create connection object (nTDSConnection)
Our KCC implementation (in source4/dsdb/kcc) is very simple at the moment. It should work by creating nTDSConnection objects under the nTDSDSA objects in the LDAP tree, then use those to create the repsFrom attributes, and possibly send DsUpdateRefs operations to the other DCs to setup a repsTo on each replication partner.
Right now we don't create nTDSConnection objects at all, which needs to be fixed.
Update to new doc release
We should look through the new WSPP docs release (from August 2009) and see what we haven't implemented yet, forming a more extensive todo list then this one. Now that we have basic replication working we can start to try to get all the corner cases right, and for that the docs (especially MS-DRSR and MS-ADTS) are a good source of information.
Why isn't repsTo written by Windows?
I have noticed that Windows is not sending us a DsUpdateRefs to update the repsTo when we join a Windows domain as a 2nd DC. This means if we followed the correct behaviour we would never send Windows a DsReplicaSync message, so we'd never tell windows to replication to us.
To work around this dreplsrv_notify_check() currently cheats by using repsFrom if repsTo is empty. We need to instead work out why Windows is not sending us DsUpdateRefs messages. Perhaps related to the lack of nTDSConnection objects?
Update: Discussions with the Microsoft AD team indicate that this is probably caused by delays in the Windows DC adding the Samba DC as a replication partner. The problem resolves itself after about 30 minutes or so. Sometimes running "repadmin /kcc" on the Windows DC helps.
The repadmin.exe tool on windows is a great way of seeing the status of replication. We would like to get all of the options of repadmin working when directed at a Samba4 DC. Anatoliy is working on making some of the functions work, but there are plenty more to do.
Hook delete in repl_meta_data
Right now we just pass delete operations down through the repl_meta_data module to the ldb_tdb backend. That means that deletes are not replicated (as they don't change anything in ReplPropertyMetaData or in the uSNChanged attribute).
We should intercept delete operations and translate them into a combination of a rename to a objected in the "Deleted Objects" tree, along with a modify to add the isDeleted attribute. Then we need to setup the tombstone data in the object, and add a tombstone reaping task that would run once a day to really delete expired tombstone records.
If DRS replication adds a link to a object that doesn't exist we are supposed to create a "phantom" object, which gets filled in later. We are working around that at the moment by delaying link creation until then end of the transaction for the replica cycle, but we should also support phantom objects.
Sort objects on disk
Some sysadmins might write scripts that rely on the return order of attributes within objects (eg. objectclass first). We sort objects on add in repl_meta_data.c to cope with this but we don't fix the sorting on modify. That should be fixed.
Speed up replmd_ldb_message_element_attid_sort
The replmd_ldb_message_element_attid_sort function is pretty inefficient. We need to avoid the attribute lookups in the sort comparison function.
Don't allow replication of readonly attribs
We should not allow replication to overwrite readonly attributes. There are other attribute flags we aren't honouring as well. We should check the docs and add support for all the relevant attribute flags.
A RODC (read-only domain controller) is a potentially very useful use case for Samba4. There is quite a lot of changes in replication and attribute filtering that should be done when we are a RODC.
Separate gc partition
Right now the gc partition is just an amalgamation of the normal base partitions, with no filtering (we just set the magic control to say that searches should cross partition boundaries).
We need to decide if we should make a separate ldb for the gc partition, and if so what method we will use to keep it in sync. If we don't create a separate partition then we should add the right filtering to gc searches.
If modify sets attrib to same value then no replPropertyMetaData change
A modify via DRS replication that asks for an attribute to change to the same value it already has should be filtered out by repl_meta_data.c so that the replPropertyMetaData attribute is not updated.
Fix error mapping (no FOOBAR, and replmd_replicated_request_werror)
We have lots of code that returns WERR_FOOBAR or NT_STATUS_FOOBAR because we didn't know what error to return. We need to go through these and either work out the correct error code, or if that is hard then at least put a reasonable guess of the right error code along with a TODO comment to check it.
We store parentGUID in the object on disk at the moment, whereas we should construct it at runtime when asked for.
Honor attribute replication flag
There is a attribute flag for whether particular attributes should be replicated. We need to check that we get this right.
Update: We are now honoring the replication flags, although not the GC filtering flags or the RODC filtering flags
Double cn fix
When we do a s4<->s4 vampire we end up with the rDN attribute appearing twice on all objects in the new replica. We think this is because we should be filtering the rDN in the getncchanges code, but this needs checking.
Update: this should be fixed by the latest commits from the plugfest
Check for parent exists in replication add and rename
During replication add and rename we need to check that the destination parent exists.
Handle add where DN exists, but different GUID
We may need to handle the case where a DRS replication comes in for a DN that exists, but with a different GUID. We need to test with windows on how this is handled.
Net commands to query repl status (via DRS?)
We should add net commands for querying the replication status (somewhat like repadmin.exe).
Max number of attributes on objects?
Metze noticed that the WSPP docs specify a maximum object size in AD of around 8k. This seems to translate into a maximum number of attributes that windows accepts. We may need to implement a similar limit to prevent problems with replication s4->windows.
Obey acls on objects
We need to obey the ntSecurityDescriptor on objects in our SAM. This is a large task! Nadya is working on it and hopefully will merge soon.
Fix ldb_add ojbectclass sorting
In ldb_add we sort objectClass attributes in the objectclass module. The sort is currently horribly inefficient - it needs redoing using the sort indexes that Andrew and Nadia have recently added.
-s option to setup_dns.sh
The setup_dns.sh should be redone as a python wrapper so it obeys standard options like -s and can read smb.conf
What triggers initial kcc run on windows after we join a w2k8 DC?
After we join a s4 DC to a windows domain, we've noticed that w2k8 needs to be prompted to run its KCC using "repadmin /kcc". We need to work out why this is needed so we can fix it.
Update: see the comments on repsTo update above
s4<->s4 in make test
We should add the s4<->s4 vampire and replication in make test
We need to add the urgent bit on replications that have changed critical objects (see the docs for a list). We will probably need to expand @REPLCHANGED to add a uSNUrgent attribute to support this.
We are not currently obeying group policies, although we can serve them out to clients. We need to obey the ones that make sense for Samba. For this we need to provide a really easy API to allow any part of Samba to query a group policy, and to auto-update SAMDB with the needed changes.
We currently accept the w2k8 linked attributes in replication, but when other DCs replicate to us we serve up linked attributes as normal attributes (which is like a downlevel w2k3 does). We should store the full meta data associated with linked attributes in more fields in the extended DN and serve it up in getncchanges.
Add support for ndr64 to wireshark
When watching w2k8-R2 <-> w2k8-R2 interactions, windows chooses NDR64 instead of NDR. We now support NDR64 in Samba, but wireshark doesn't understand it. To allow us to watch traffic between w2k8-R2 boxes we would like wireshark to understand NDR64.
Convert wireshark drsuapi to pidl
The DRSUAPI decoder in wireshark is quite poor. We should redo it using a pidl based parser.
Fix decryption of w2k8 by wireshark (krb5 patch)
When watching w2k8 <-> samba traffic in wireshark we often find that wireshark cannot decrypt some of the traffic. This is due to a bug/limitation in MIT kerberos. Metze has a hack based on LD_PRELOAD that works around this, but we should try to get this into the wireshark svn tree directly.
bitmap32 actually 3264 in samr QueryUserInfo level 16? (netmon bug too)
There seems to be a problem with the QueryUserInfo level 16 and NDR64. The Microsoft netmon 3.3 parser has the same problem as our ndrdump parser. We need to look into how this should be handled.
Update: this was fixed by the addition of NDR64 union alignment.
How does another DC become the FSMO master and RID master?
We need to work out how a DC should become the FSMO master and RID master. We can do it now via ldbedit, but there should be a more automated method (perhaps the KCC should do this?)
Implement RID Master and RID pools
We need to implement the RID master and allocate RIDs out from the RID pool
Add LDAP backend to BIND that uses AD's in-directory format
We need to serve DNS based on the data in Active Directory. There is an LDAP backend for BIND already, but there is no doubt a lot of work between that and using it against an AD-like database
Add interim DNS zone generator based on sam.ldb
We should have provision generate a zone file based on all the listed domain controllers in sam.ldb, not just a single DC. It could look up the other DCs with DNS to find their IP, and use the specified IP for the new server.
This will help us handle the DNS reproducibly while we wait for the above item.
Incorporate nssupdate-gss into the 'net vampire' command
We need to call nssupdate-gss from at the end of the 'net vampire' command, so that we don't need to manually run 'setup_dns.sh' after a vampire
Script and tests for takeover of FSMO
We should have a script to (optionally forcibly) take over the FSMO roles of a domain
(currently this is done by a local modify to our LDB, but a script like the one to raise the functional level would be good)