Code Coverage

Revision as of 04:08, 21 May 2019 by Timbeale (talk | contribs) (source3/nmbd (NETBIOS): typos)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Code coverage is generated via a periodic Gitlab pipeline and is aggregated from merging the result of each individual job (which together runs all the available tests). This includes tests for LDB and TDB, which have their own individual build systems (and currently excludes CTDB). This page only describes C code coverage currently.

LCOV coverage for C code is available here:

https://samba-team.gitlab.io/devel/samba/

Total coverage

Last updated: 2019-05-14 17:41:12

Line coverage: 39.8% (519470 / 1304847)
Function coverage: 42.0% (30933 / 73567)

Function coverage is generally better than line coverage across the code base. In a number of cases where function coverage is not 100%, the calls are stubbed out and return errors such as 'Not Implemented'. Much of the test code has high levels of code coverage indicating that these tests are actually being run correctly. From looking at what is being covered, it does not seem like there is much trivial testing that has been obviously missed and much of it requires a more in-depth investigation.

Line coverage exceeding 75% normally indicates reasonably good coverage in a file. Reaching higher than this (and 100%) is often impossible without heavy instrumentation because of the nature of C code. When inspecting these files, you can notice that many of the allocation errors and general error cases are not well covered. Under normal operation, and even in extreme error conditions, many of these cases are practically impossible to hit. In some cases, as a result of defensive coding, these cases are in fact impossible to hit (but allow refactoring and other unexpected external changes to preserve the correct behaviour).

Line coverage below 60% may be in need of some further testing, although do note that in some cases, these files have more error cases or difficult to test scenarios and so 60% (or below) may already indicate reasonable coverage.

Notable areas

bin/default/librpc/gen_ndr (Generated Code)

Line coverage: 18.1% (104224 / 574711)

Generated code contributes to almost half of all detected C code. This code is generated using PIDL which reads IDL files and generates (network) marshalling code. In particular, it writes binary serialization and deserialization functions for structures sent over network protocols, formatted print routines for structures and in some cases generates remote procedure call (RPC) interfaces. These RPC interfaces are generally for DCE/RPC protocols implemented by Active Directory and internal messaging. They are often stubbed and allow manual implementation elsewhere in the code.

In many cases, there are no callers of our formatted print routines for many structures. Many structures remain unused, and many structures are deprecated (e.g. v3, v5 are from earlier versions of the protocol and v8 is commonly in use), as well as many functions defined in various RPC protocol specifications. In fact, many exposed functions which are at the latest version may still be superseded by non-RPC mechanisms, or clients generally perform these operations over a different protocol e.g. LDAP.

One other caveat with the generated code is that there is a significant amount of dead-code generated because PIDL is quite simple and may duplicate a number of different checks that LCOV might be detecting. Regardless, the coverage statistics for this set of code is very low and any ways we can improve this warrant investigation.

source4/heimdal (Kerberos Implementation Forked from Heimdal)

This code implements our Kerberos Key Distribution Center (KDC) and includes a number of client libraries and crypto. This is a fork of Heimdal (known as Lorikeet Heimdal) from some time ago with security fixes and some of our own patches. Although this forms a critical component of the Samba Active Directory implementation, a number of the components in the Heimdal code base which we have copied into our source tree are not used (source4/heimdal/lib/ntlm). Similarly, any tests that come with Heimdal and could be exercising different functionality are not being run within our code. Due to the way we wrap the KDC and our own independent implementation of different security layers (GENSEC and GSSAPI), many paths in this code are never used and should never be used.

source4/dsdb/samdb/ldb_modules (LDB Modules)

Line coverage: 73.0% (16673 / 22843)

In terms of the Active Directory code, the LDB modules implement much of the underlying AD logic as database layers. Aggregating the entire LDB module stack gives an LDAP-like database with AD compatibility, which is then wrapped with our LDAP server. The core LDB implements a LDAP-like database (without any constraints), backed by a key value store (TDB, a Samba-based project or LMDB). The modules code and core LDB code (and TDB) in the lib/ folder have similar levels of code coverage as the modules. Much of the LDAP server seems to be similarly tested (or slightly better).

Some notable exceptions in the code coverage are modules which are not run frequently or at all. There is an example skel.c in lib/ldb/modules for describing how to build an LDB module. In the Samba AD specific modules, in source4/dsdb/samdb/ldb_modules, there is count_attrs which is a statistics gathering module for development purposes and some historical attempts at OpenLDAP backends (parts which exists in a number of places in the code base).

source4/torture and source3/torture (Test Code)

These describe a number of tests written in C code (as opposed to our Python tests), our primary testing method in the past but sometimes written due to the difficulty of creating external bindings. The coverage of the source4 code is notably better than the source3. This likely warrants more investigation, although it seems to be that benchmarking code (for performance reasons) is not triggered by our CI.

source3/torture

Line coverage: 47.1% (6519 / 13830)
Function coverage: 58.1% (293 / 504)

source4/torture

Varies by directory, generally 80%+ line coverage besides nbench.

source3/rpc_server, source4/rpc_server (DCE/RPC)

These directories implement remote procedure call servers described by MSRPC and the DCE/RPC protocol. A significant portion of functionality is duplicated between the two servers and when an AD DC is run, most calls should be directed towards the source4 server. In the file server, the source3 server is run, and even while the source4 server is running due to Active Directory, this server is still partly exposed. Since there isn't a single consolidated server, and with Active Directory related features only implemented in the source4 server, testing is inconsistent. The source3 server appears to have poorer testing, which is also a consequence of being much older but also because of the mixture of functionality that was eventually provided by source4.

A notable exception to the mostly high coverage of the source4 server is the DNSSERVER remote DNS administration pipe. The BROWSER pipe appears poorly tested, but in reality, the entire pipe is full of stubs and perform no useful functionality regardless.

As there are a number of functions implemented by our RPC servers, many of which are not used by modern clients or are only used very rarely, our rpcclient tool for manually triggering many of these calls is also not well covered by testing. Lots of these combinations which are available in our servers and generated by our generic client tool, may end up in stubs or protocol errors (which aren't that meaningful).

source3/rpcclient

Line coverage: 6.2% (494 / 8025)

source4/ntvfs (File Server Not Running in Production)

In the fork of Samba to implement the Active Directory, a new file server (NTVFS) was written. This only implements some of the existing functionality of the original server and had a number of stubs. With the merge of the AD components back into the file server code base, this server was disabled by default but many tests still relied on it. These tests were not comprehensive at all and many had little meaningful dependence on the server but had some implementation specific details recorded in the tests. This leads to this codebase having less than 50% line coverage in parts.

source3/nmbd (NETBIOS)

Line coverage: 38.0% (2175 / 5728)
Function coverage: 53.4% (164 / 307)

Provides Netbios name services and implements a number of protocols and functions (some of which may be considered obsolete with new versions of Windows). In Active Directory, nbtserver is what is used instead to provide this functionality and nmbd is not normally used (nor can be used). DNS often replaces the use of Netbios, with DNS names being preferred than the short Netbios name. In some of our code, Netbios would only be used as the fallback and so much of this code may not be hit unless there are resolution issues.

source3/winbindd (Winbind)

Line coverage: 33.8% (6677 / 19765)
Function coverage: 51.2% (511 / 998)

Winbind is responsible for a wide variety of operations in the Samba codebase. Many operations relate to inter-process or inter-DC communication and asynchronous resolution. One of the main purposes of Winbind is to maintain domain membership in an Active Directory (or NT4) domain. Winbind is also responsible for forwarding authentication when it cannot be done locally e.g. Read only domain controllers (RODC) with only some passwords stored.

Winbind is complicated and has a large number of fallbacks which may often not be used (Netbios resolution when DNS is unavailable). Many consumers of Winbind are internal and many functions are not exposed, leading to some functions which may be unnecessary or duplicated.

In the Active Directory implementation (source4/winbindd), calls are forwarded to this version of Winbind in source3 (a previous version existed as a result of the AD fork).

source3/smbd (SMB/CIFS File Server)

Line coverage: 63.4% (26729 / 42164)
Function coverage: 82.3% (1346 / 1636)

Implements the file server used as a standalone or as part of the Active Directory domain controller. This implements SMB1, SMB2 and parts of SMB3. Older versions of the protocol were written a long time ago and may lack the appropriate testing. Parts of the older protocols were also overly complex and may be reflected in the code. These protocols are quite complex in general and require a number of different races and contentions to exercise the networking locking behaviour. Compared to non-networked code, the amount of errors cases and mixtures of these cases are likely to be higher and harder to trigger.

source3/modules (Samba File Server VFS Modules)

Line coverage: 37.7% (9698 / 25692)
Function coverage: 47.5% (868 / 1828)

In order to implement SMB network file semantics and Windows ACL semantics, there are virtual file system (VFS) modules to intercept and transform file operations into operations on a POSIX / Linux based system (and file system). A number of these modules are optional and there are a few in the source tree due to vendors (like anti-virus scanning) to simplify maintenance burdens. Since they have performance implications or other external dependencies (like clustered environments like Ceph), many are not turned on by default and are not tested in our default testing environments.

Normally our testing is done on the EXT4 file system, but by swapping out our VFS modules we support other file systems which are not as well tested if you look at code coverage, as they are not run in our normal CI and build pipelines. Some of our tests depend heavily on EXT4 semantics and are hard-coded, so simply switching the backend file system is not necessarily appropriate. Furthermore, in situations like clustered environments, the semantics and guarantees may not be at the same level.

third_party (Third Party Code)

These are our third party sources, which are pulled in as tarballs or full copies of their respective code bases. In some cases, they might have their own build files, tests or other artifacts which might normally be triggered in upstream development. Like Heimdal, we may not use much of the codebase and only use certain functions.

Part of the reason there is a third party directory in our source tree is so that we do not have to reimplement functionality (and we use the most well known implementations) and worry about their respective testing or maintenance.

Areas of code to consider removing or consolidating

Group Policy Code

There have been many attempts at implementing group policy in Samba, to different extents and having different goals. Lots of the code is not even run or exercised in any real way and does not run in production by default.

Registry Code

This lives in a similar state to the group policy code and is complicated by differing implementations from the Samba file server, Active Directory fork that occurred in the code base's history.

NTVFS File server

Described in the earlier section, many tests need to be changed to run on the non NTVFS file server.