Samba codebase organization: Difference between revisions

From SambaWiki
Line 29: Line 29:
* [https://git.samba.org/?p=samba.git;a=tree;f=auth Common authentication library (auth)] is the common (between and <code>source3</code> and <code>source4</code>) parts of Samba’s authentication implementation.
* [https://git.samba.org/?p=samba.git;a=tree;f=auth Common authentication library (auth)] is the common (between and <code>source3</code> and <code>source4</code>) parts of Samba’s authentication implementation.


* [https://git.samba.org/?p=samba.git;a=tree;f=pidl PIDL] is Samba’s code auto-generation system for generating C code and C-Python bindings from IDL.
* [[PIDL|PIDL]] is Samba’s code auto-generation system for generating C code and C-Python bindings from IDL.


* [https://git.samba.org/?p=samba.git;a=tree;f=python python] contains Samba’s Python library. It is not generally used in the file server, but is critical for the AD DC.
* [https://git.samba.org/?p=samba.git;a=tree;f=python python] contains Samba’s Python library. It is not generally used in the file server, but is critical for the AD DC.

Revision as of 20:44, 12 June 2020

Introduction

Broadly speaking, the Samba source-code tree can be organized into the following major groups:

  • Top-level libraries, which contains common code shared amongst the Samba processes.
  • Source3, which is code primarily used by the file server and domain member.
  • Source4, which is code primarily used by the Active Directory Domain Controller.
  • Infrastructure components, which provide the build and test framework for Samba.
  • Autogenerated code, which is used for parsing DCE/RPC packets as well as other regularly structured buffers.

The following sections break down the codebase layout in more detail. This is not intended to be a comprehensive directory, and just covers the major components.

Top-Level libraries

At the time of the merge, all code was located in either the source3 or source4 directory. Over time, as duplicate code between the two branches becomes merged or used in common, the code is moved out into the top-level of the source-code tree.

The major libraries components at the top-level are:

  • Third-party libraries Samba needs some specific libraries to build. Some of these are included in the Samba source tree to aid in building on older and non-Linux platforms.
  • General purpose libraries Samba, being like any large program written in C, has a number of internal helper functions that do not implement the protocols but are required to share code and make the rest of Samba possible.
The sub-projects of talloc, tdb, tevent and ldb and live here, in the lib directory.
  • PIDL is Samba’s code auto-generation system for generating C code and C-Python bindings from IDL.
  • python contains Samba’s Python library. It is not generally used in the file server, but is critical for the AD DC.
  • CTDB Samba’s clustered database (which enables the clustered file server).

Source3

The source3 directory is home to code primarily used by the file server and domain member. source3 contains the following major components:

  • The SMB file server (smbd) is the file server that most people think of when they think of Samba.
  • The NBT name server (nmbd) provides NetBIOS over TCP/IP (NBT) for those who want it.
  • Winbindd provides the connection between Samba and the AD Domain to which it is joined, for authentication and name lookup. It also manages the IDMAP, being the mapping between unix UID/GID values and Windows SID values. winbindd is used in both Domain member and AD Domain Controller modes.
  • RPC client library (librpc) contains the parts of Samba’s RPC client implementation that are specific to the source3 subtree.
  • SMB client library (libsmb) contains the parts of Samba’s SMB client implementation used in the source3 subtree.
  • Authentication server (auth) contains the parts of Samba’s NTLM authentication server used in the source3 subtree. A shim module connects this to the source4 authentication code when Samba is an AD DC.
  • Password database (passdb) contains the NTLM password database used in the source3 subtree. A shim module connects this to the sam.ldb data store when Samba is an AD DC.
  • RPC server contains the source3 RPC server. However, most parts of this are not used in the AD DC, but instead are redirected to the equivalent parts of source4/rpc_server. When used, this provides the classic or NT4-like DC either as a DC or to service the SAM on each member or standalone server (each Windows machine has a database under its own name, which Samba does too).
  • Print server functionality is located in the printing directory. Also relevant is the source3/rpc_server/spoolss code.



Source4

The source4 directory is home to code primarily used by the Active Directory Domain Controller. source4 contains the following major components:

  • Active Directory Database templates located in setup. These templates fill out the basic structure of an Active Directory DC in the sam.ldb. This includes the full schema definition.
  • Heimdal is an (old) branch/fork of Heimdal with some changes. An attempt is made to sync this Samba fork with a tree called lorikeet-heimdal (which is a true branch/fork of Heimdal). Patches applied here should first be incorporated upstream, however this has not always happened.
  • General purpose libraries (lib) that have not yet been migrated to the top level.
  • Client library (libcli) contains the parts of Samba’s client implementation for our protocols specific to the source4 codebase.
  • RPC client library (librpc) contains the parts of Samba’s RPC client implementation specific to the codebase.
  • libgpo contains Group Policy Object support.
  • smbtorture binary, used for testing Samba and Windows. For historical reasons there are two smbtorture frameworks, the source4 framework is the one being extended at this time, but some tests will remain in source3/torture.
  • Old NTVFS file server and VFS layer. The attempt at a new file server architecture is preserved in the following directories. These demonstrated a new VFS layer that is organised around the SMB and NTFS semantics rather than the POSIX semantics that Samba used in smbd at the time (smbd now uses a hybrid approach).
  • AD Services. The core AD DC is implemented in the named folders for each component:
  • Authentication server (auth) contains parts of Samba’s authentication server used in the AD DC. A shim module connects smbd to this authentication code when Samba is an AD DC.
  • The Directory Services DB (DSDB), which provides the main implementation behind the sam.ldb database (covered in more detail below).


Directory Services DB (DSDB)

The code that implements the main AD database is located in source4/dsdb. The dsdb directory contains the following notable components:

  • LDB modules The LDB library provides a generic framework where custom plug-in modules can be added to modify the database’s behaviour. DSDB uses the LDB library framework and defines its own set of plug-in modules (located in dsdb/samdb/ldb_modules) that are specific to Active Directory. The result is a database that provides the full AD semantics.
  • Schema handling The sam.ldb database follows and conforms to the AD schema. The handling for loading and using the full AD schema is located in source4/dsdb/schema.
  • Replication handling (part) Some of the code related to handling AD’s DRS replication is located in source4/dsdb/repl.
  • KCC The Knowledge Consistency Checker (KCC) is a process that ensures that a valid replication graph is maintained and other periodic cleanup work is done. Parts of the implementation are located in source4/dsdb/kcc, mostly for historical reasons. Other KCC handling is also located in python/samba/kcc.



Infrastructure components

The source-code tree contains the following components that are used to build and test Samba.

  • Selftest is a bespoke framework for unit and integration testing. The tests themselves are located in many different parts of the source tree.
  • Wintest is a system that sits outside Samba’s selftest. Wintest builds and installs Samba and runs some limited testing against Windows automatically. Note that this system is not currently maintained and in-use.
  • Build system. the code in buildtools uses the wscript files in each directory of the source tree in order to build Samba.
  • Documentation. Samba’s manpages are constructed from XML and are located in the docs-xml directory. In particular the smb.conf manpage is constructed from a whole sub-directory of files in here.
  • Note that the internal list of valid parameters in Samba is created from the XML documentation of each configuration parameter, ensuring the code and documentation is always consistent. Documented defaults are also checked for consistency in the automated test-suites.


Autogenerated code

Note that significant amounts of Samba’s codebase is autogenerated from IDL (Interface Definition Language) files. This code is spread across source-code tree (i.e. source3, source4, and top-level libraries).

PIDL generates pull (serialize, or pack) and push (deserialize, or unpack) functions for all the structures described using IDL, and structures marked [public] are exposed in public functions in C and Python. This is very helpful for parsing not just DCE/RPC packets but any other regularly structured buffer. The IDL files are located:

For complex structures that don’t quite fit into IDL, a marker [nopull], [nopush], or [noprint] can be specified. Hand-written parsers can then be written to handle these structures. These manual parsers are located in:

See the PIDL page for specifics on PIDL syntax and examples.