Samba How to: Difference between revisions

From SambaWiki
No edit summary
Line 577: Line 577:
* [https://git.samba.org/?p=samba.git;a=tree;f=source4/librpc/ndr source3/librpc/ndr]
* [https://git.samba.org/?p=samba.git;a=tree;f=source4/librpc/ndr source3/librpc/ndr]
* [https://git.samba.org/?p=samba.git;a=tree;f=source4/librpc/ndr source4/librpc/ndr]
* [https://git.samba.org/?p=samba.git;a=tree;f=source4/librpc/ndr source4/librpc/ndr]

See the [[PIDL|PIDL page]] for specifics on PIDL syntax and examples.

Revision as of 20:28, 11 June 2020

Development Practices

Typical development process

The typical development process on Samba looks like this:

  • A developer has a problem to solve. This might be fixing a bug, or implementing some previously unsupported Windows Server functionality.
  • The developer would then write test cases that demonstrate the problem. For server-side behaviour, these test cases would pass when run against a Windows DC, but fail against a Samba DC.
  • The finished tests are integrated into Samba’s self-test and marked as known failures initially. There are a couple of benefits to this approach:
  • It’s standard practice in Test-Driven Development (TDD) to help prove that the new test-case works correctly.
  • It means git bisect can be run over the codebase, which can help to identify any degradations introduced to Samba. After any given commit, the Samba code will always compile, and will always pass all tests.
  • The developer then writes the code to fix the bug or implement the desired functionality. The known failure status for the new tests is removed, the new tests are re-run, and this time they should all pass.
  • The developer should then run the full Continuous Integration (CI) test suite over their changes, to verify they haven’t broken any existing functionality. The Gitlab CI provides a convenient way to do this, although there are several other approaches. Developers outside the samba team can submit patches to run against the Gitlab CI by following the Merge Requests process on the Contribute page.
  • The developer should end up with a coherent set of patches that add the new functionality, along with tests that prove the new functionality works correctly. They then submit the patch-set as explained by Contribute page.
  • The code is reviewed by Samba Team members. While any developer can potentially contribute changes to the Samba codebase, only Samba Team members have the access rights to actually deliver code changes to the master code branch. Usually the reviewers provide some feedback on how the patches could be further improved.
  • Once the reviewer is happy, the code must then pass a final CI test run before it’s incorporated into the main Samba codebase.

The following sections cover the Continuous Integration and Code Review process in more detail, as these steps are particularly important to maintaining the quality of the Samba codebase.

Continuous Integration

Samba's autobuild system is the core of our CI system.

Code Review

See Samba's Code Review Guidelines

Coding Style

Quick Start

Coding style guidelines are about reducing the number of unnecessary reformatting patches and making things easier for developers to work together. You don't have to like them or even agree with them, but once put in place we all have to abide by them (or vote to change them). However, coding style should never outweigh coding itself and so the guidelines described here are hopefully easy enough to follow as they are very common and supported by tools and editors.

The basic style for C code is the Linux kernel coding style (See Documentation/CodingStyle in the kernel source tree). This closely matches what most Samba developers use already anyways, with a few exceptions as mentioned below.

The coding style for Python code is documented in PEP8. New Python code should be compatible with Python 2.6, 2.7, and Python 3.4 onwards. This means using Python 3 syntax with the appropriate 'from __future__' imports.

But to save you the trouble of reading the Linux kernel style guide, here are the highlights.

  • Maximum Line Width is 80 Characters
- The reason is not about people with low-res screens but rather sticking to 80 columns prevents you from easily nesting more than one level of if statements or other code blocks. Use source3/script/count_80_col.pl to check your changes.
  • Use 8 Space Tabs to Indent
- No whitespace fillers.
  • No Trailing Whitespace
- Use source3/script/strip_trail_ws.pl to clean up your files before committing.
  • Follow the K&R guidelines
- We won't go through all of them here. Do you have a copy of "The C Programming Language" anyways right? You can also use the format_indent.sh script found in source3/script/ if all else fails.

Editor Hints

Emacs

Add the follow to your $HOME/.emacs file:

 (add-hook 'c-mode-hook
       (lambda ()
               (c-set-style "linux")
               (c-toggle-auto-state)))

Vi

(Thanks to SATOH Fumiyasu <fumiyas@osstech.jp> for these hints):

For the basic vi editor included with all variants of \*nix, add the following to $HOME/.exrc:

 set tabstop=8
 set shiftwidth=8

For Vim, the following settings in $HOME/.vimrc will also deal with displaying trailing whitespace:

 if has("syntax") && (&t_Co > 2 || has("gui_running"))
       syntax on
       function! ActivateInvisibleCharIndicator()
               syntax match TrailingSpace "[ \t]\+$" display containedin=ALL
               highlight TrailingSpace ctermbg=Red
       endf
       autocmd BufNewFile,BufRead * call ActivateInvisibleCharIndicator()
 endif
 " Show tabs, trailing whitespace, and continued lines visually
 set list listchars=tab:»·,trail:·,extends:…
 
 " highlight overly long lines same as TODOs.
 set textwidth=80
 autocmd BufNewFile,BufRead *.c,*.h exec 'match Todo /\%>' . &textwidth . 'v.\+/'

clang-format

 BasedOnStyle: LLVM
 IndentWidth: 8
 UseTab: true
 BreakBeforeBraces: Linux
 AllowShortIfStatementsOnASingleLine: false
 IndentCaseLabels: false
 BinPackParameters: false
 BinPackArguments: false
 SortIncludes: false

Comments

Comments should always use the standard C syntax. C++ style comments are not currently allowed.

The lines before a comment should be empty. If the comment directly belongs to the following code, there should be no empty line after the comment, except if the comment contains a summary of multiple following code blocks.

This is good:

 ...
 int i;
 
 /*
  * This is a multi line comment,
  * which explains the logical steps we have to do:
  *
  * 1. We need to set i=5, because...
  * 2. We need to call complex_fn1
  */
 
 /* This is a one line comment about i = 5. */
 i = 5;
 
 /*
  * This is a multi line comment,
  * explaining the call to complex_fn1()
  */
 ret = complex_fn1();
 if (ret != 0) {
 ...
 
 /**
  * @brief This is a doxygen comment.
  *
  * This is a more detailed explanation of
  * this simple function.
  *
  * @param[in]   param1     The parameter value of the function.
  *
  * @param[out]  result1    The result value of the function.
  *
  * @return              0 on success and -1 on error.
  */
 int example(int param1, int *result1);

This is bad:

 ...
 int i;
 /*
  * This is a multi line comment,
  * which explains the logical steps we have to do:
  *
  * 1. We need to set i=5, because...
  * 2. We need to call complex_fn1
  */
 /* This is a one line comment about i = 5. */
 i = 5;
 /*
  * This is a multi line comment,
  * explaining the call to complex_fn1()
  */
 ret = complex_fn1();
 if (ret != 0) {
 ...
 
 /*This is a one line comment.*/
 
 /* This is a multi line comment,
    with some more words...*/
 
 /*
  * This is a multi line comment,
  * with some more words...*/

Indention & Whitespace & 80 columns

To avoid confusion, indentations have to be tabs with length 8 (not 8 ' ' characters). When wrapping parameters for function calls, align the parameter list with the first parameter on the previous line. Use tabs to get as close as possible and then fill in the final 7 characters or less with whitespace. For example,

 var1 = foo(arg1, arg2,
            arg3);

The previous example is intended to illustrate alignment of function parameters across lines and not as encourage for gratuitous line splitting. Never split a line before columns 70 - 79 unless you have a really good reason. Be smart about formatting.

One exception to the previous rule is function calls, declarations, and definitions. In function calls, declarations, and definitions, either the declaration is a one-liner, or each parameter is listed on its own line. The rationale is that if there are many parameters, each one should be on its own line to make tracking interface changes easier.

If, switch, & Code blocks

Always follow an 'if' keyword with a space but don't include additional spaces following or preceding the parentheses in the conditional. This is good:

 if (x == 1)

This is bad:

 if ( x == 1 )

Yes we have a lot of code that uses the second form and we are trying to clean it up without being overly intrusive.

Note that this is a rule about parentheses following keywords and not functions. Don't insert a space between the name and left parentheses when invoking functions.

Braces for code blocks used by for, if, switch, while, do..while, etc. should begin on the same line as the statement keyword and end on a line of their own. You should always include braces, even if the block only contains one statement. NOTE: Functions are different and the beginning left brace should be located in the first column on the next line.

If the beginning statement has to be broken across lines due to length, the beginning brace should be on a line of its own.

The exception to the ending rule is when the closing brace is followed by another language keyword such as else or the closing while in a do..while loop.

Good examples:

 if (x == 1) {
         printf("good\n");
 }
 
 for (x=1; x<10; x++) {
         print("%d\n", x);
 }
 
 for (really_really_really_really_long_var_name=0;
      really_really_really_really_long_var_name<10;
      really_really_really_really_long_var_name++)
 {
         print("%d\n", really_really_really_really_long_var_name);
 }
 
 do {
         printf("also good\n");
 } while (1);

Bad examples:

 while (1)
 {
         print("I'm in a loop!\n"); }
 
 for (x=1;
      x<10;
      x++)
 {
         print("no good\n");
 }
 
 if (i < 10)
         print("I should be in braces.\n");

Goto

While many people have been academically taught that "goto"s are fundamentally evil, they can greatly enhance readability and reduce memory leaks when used as the single exit point from a function. But in no Samba world what so ever is a goto outside of a function or block of code a good idea.

Good Examples:

 int function foo(int y)
 {
         int *z = NULL;
         int ret = 0;
 
         if (y < 10) {
                 z = malloc(sizeof(int) * y);
                 if (z == NULL) {
                         ret = 1;
                         goto done;
                 }
         }
 
         print("Allocated %d elements.\n", y);
 
  done:
         if (z != NULL) {
                 free(z);
         }
 
         return ret;
 }

Primitive Data Types

Samba has large amounts of historical code which makes use of data types commonly supported by the C99 standard. However, at the time such types as boolean and exact width integers did not exist and Samba developers were forced to provide their own. Now that these types are guaranteed to be available either as part of the compiler C99 support or from lib/replace/, new code should adhere to the following conventions:

  • Booleans are of type "bool" (not BOOL)
  • Boolean values are "true" and "false" (not True or False)
  • Exact width integers are of type [u]int[8|16|32|64]_t

Most of the time a good name for a boolean variable is 'ok'. Here is an example we often use:

 bool ok;
 
 ok = foo();
 if (!ok) {
         /* do something */
 }

It makes the code more readable and is easy to debug.

Typedefs

Samba tries to avoid "typedef struct { .. } x_t;" so we do always try to use "struct x { .. };". We know there are still such typedefs in the code, but for new code, please don't do that anymore.

Initialize pointers

All pointer variables MUST be initialized to NULL. History has demonstrated that uninitialized pointer variables have lead to various bugs and security issues.

Pointers MUST be initialized even if the assignment directly follows the declaration, like pointer2 in the example below, because the instructions sequence may change over time.

Good Example:

 char *pointer1 = NULL;
 char *pointer2 = NULL;
 
 pointer2 = some_func2();
 
 ...
 
 pointer1 = some_func1();

Bad Example:

 char *pointer1;
 char *pointer2;
 
 pointer2 = some_func2();
 
 ...
 
 pointer1 = some_func1();

Make use of helper variables

Please try to avoid passing function calls as function parameters in new code. This makes the code much easier to read and it's also easier to use the "step" command within gdb.

Good Example:

 char *name = NULL;
 int ret;
 
 name = get_some_name();
 if (name == NULL) {
         ...
 }
 
 ret = some_function_my_name(name);
 ...


Bad Example:

 ret = some_function_my_name(get_some_name());
 ...

Please try to avoid passing function return values to if- or while-conditions. The reason for this is better handling of code under a debugger.

Good example:

 x = malloc(sizeof(short)*10);
 if (x == NULL) {
         fprintf(stderr, "Unable to alloc memory!\n");
 }

Bad example:

 if ((x = malloc(sizeof(short)*10)) == NULL ) {
         fprintf(stderr, "Unable to alloc memory!\n");
 }

There are exceptions to this rule. One example is walking a data structure in an iterator style:

 while ((opt = poptGetNextOpt(pc)) != -1) {
         ... do something with opt ...
 }

Another exception: DBG messages for example printing a SID or a GUID: Here we don't expect any surprise from the printing functions, and the main reason of this guideline is to make debugging easier. That reason rarely exists for this particular use case, and we gain some efficiency because the DBG_ macros don't evaluate their arguments if the debuglevel is not high enough.

 if (!NT_STATUS_IS_OK(status)) {
         struct dom_sid_buf sid_buf;
         struct GUID_txt_buf guid_buf;
         DBG_WARNING(
             "objectSID [%s] for GUID [%s] invalid\n",
             dom_sid_str_buf(objectsid, &sid_buf),
             GUID_buf_string(&cache->entries[idx], &guid_buf));
 }

But in general, please try to avoid this pattern.

Control-Flow changing macros

Macros like NT_STATUS_NOT_OK_RETURN that change control flow (return/goto/etc) from within the macro are considered bad, because they look like function calls that never change control flow. Please do not use them in new code.

The only exception is the test code that depends repeated use of calls like CHECK_STATUS, CHECK_VAL and others.

Error and out logic

Don't do this:

 frame = talloc_stackframe();
 
 if (ret == LDB_SUCCESS) {
         if (result->count == 0) {
                 ret = LDB_ERR_NO_SUCH_OBJECT;
         } else {
                 struct ldb_message *match =
                         get_best_match(dn, result);
                 if (match == NULL) {
                         TALLOC_FREE(frame);
                         return LDB_ERR_OPERATIONS_ERROR;
                 }
                 *msg = talloc_move(mem_ctx, &match);
         }
 }
 
 TALLOC_FREE(frame);
 return ret;

It should be:

 frame = talloc_stackframe();
 
 if (ret != LDB_SUCCESS) {
         TALLOC_FREE(frame);
         return ret;
 }
 
 if (result->count == 0) {
         TALLOC_FREE(frame);
         return LDB_ERR_NO_SUCH_OBJECT;
 }
 
 match = get_best_match(dn, result);
 if (match == NULL) {
         TALLOC_FREE(frame);
         return LDB_ERR_OPERATIONS_ERROR;
 }
 
 *msg = talloc_move(mem_ctx, &match);
 TALLOC_FREE(frame);
 return LDB_SUCCESS;

DEBUG statements

Use these following macros instead of DEBUG:

 DBG_ERR         log level 0             error conditions
 DBG_WARNING     log level 1             warning conditions
 DBG_NOTICE      log level 3             normal, but significant, condition
 DBG_INFO        log level 5             informational message
 DBG_DEBUG       log level 10            debug-level message

Example usage:

 DBG_ERR("Memory allocation failed\n");
 DBG_DEBUG("Received %d bytes\n", count);

The messages from these macros are automatically prefixed with the function name.

Samba codebase organization

Broadly speaking, the Samba source-code tree can be organized into the following major groups:

  • Top-level libraries, which contains common code shared amongst the Samba processes.
  • Source3, which is code primarily used by the file server and domain member.
  • Source4, which is code primarily used by the Active Directory Domain Controller.
  • Infrastructure components, which provide the build and test framework for Samba.

The following sections break down the codebase layout in more detail. This is not intended to be a comprehensive directory, and just covers the major components.


Top-Level libraries

At the time of the merge, all code was located in either the source3 or source4 directory. Over time, as duplicate code between the two branches becomes merged or used in common, the code is moved out into the top-level of the source-code tree.

The major libraries components at the top-level are:

  • Third-party libraries Samba needs some specific libraries to build. Some of these are included in the Samba source tree to aid in building on older and non-Linux platforms.
  • General purpose libraries Samba, being like any large program written in C, has a number of internal helper functions that do not implement the protocols but are required to share code and make the rest of Samba possible.
The sub-projects of talloc, tdb, tevent and ldb and live here, in the lib directory.
  • PIDL is Samba’s code auto-generation system for generating C code and C-Python bindings from IDL.
  • python contains Samba’s Python library. It is not generally used in the file server, but is critical for the AD DC.
  • CTDB Samba’s clustered database (which enables the clustered file server).

Source3

The source3 directory is home to code primarily used by the file server and domain member. source3 contains the following major components:

  • The SMB file server (smbd) is the file server that most people think of when they think of Samba.
  • The NBT name server (nmbd) provides NetBIOS over TCP/IP (NBT) for those who want it.
  • Winbindd provides the connection between Samba and the AD Domain to which it is joined, for authentication and name lookup. It also manages the IDMAP, being the mapping between unix UID/GID values and Windows SID values. winbindd is used in both Domain member and AD Domain Controller modes.
  • RPC client library (librpc) contains the parts of Samba’s RPC client implementation that are specific to the source3 subtree.
  • SMB client library (libsmb) contains the parts of Samba’s SMB client implementation used in the source3 subtree.
  • Authentication server (auth) contains the parts of Samba’s NTLM authentication server used in the source3 subtree. A shim module connects this to the source4 authentication code when Samba is an AD DC.
  • Password database (passdb) contains the NTLM password database used in the source3 subtree. A shim module connects this to the sam.ldb data store when Samba is an AD DC.
  • RPC server contains the source3 RPC server. However, most parts of this are not used in the AD DC, but instead are redirected to the equivalent parts of source4/rpc_server. When used, this provides the classic or NT4-like DC either as a DC or to service the SAM on each member or standalone server (each Windows machine has a database under its own name, which Samba does too).
  • Print server functionality is located in the printing directory. Also relevant is the source3/rpc_server/spoolss code.



Source4

The source4 directory is home to code primarily used by the Active Directory Domain Controller. source4 contains the following major components:

  • Active Directory Database templates located in setup. These templates fill out the basic structure of an Active Directory DC in the sam.ldb. This includes the full schema definition.
  • Heimdal is an (old) branch/fork of Heimdal with some changes. An attempt is made to sync this Samba fork with a tree called lorikeet-heimdal (which is a true branch/fork of Heimdal). Patches applied here should first be incorporated upstream, however this has not always happened.
  • General purpose libraries (lib) that have not yet been migrated to the top level.
  • Client library (libcli) contains the parts of Samba’s client implementation for our protocols specific to the source4 codebase.
  • RPC client library (librpc) contains the parts of Samba’s RPC client implementation specific to the codebase.
  • libgpo contains Group Policy Object support.
  • smbtorture binary, used for testing Samba and Windows. For historical reasons there are two smbtorture frameworks, the source4 framework is the one being extended at this time, but some tests will remain in source3/torture.
  • Old NTVFS file server and VFS layer. The attempt at a new file server architecture is preserved in the following directories. These demonstrated a new VFS layer that is organised around the SMB and NTFS semantics rather than the POSIX semantics that Samba used in smbd at the time (smbd now uses a hybrid approach).
  • AD Services. The core AD DC is implemented in the named folders for each component:
  • Authentication server (auth) contains parts of Samba’s authentication server used in the AD DC. A shim module connects smbd to this authentication code when Samba is an AD DC.
  • The Directory Services DB (DSDB), which provides the main implementation behind the sam.ldb database (covered in more detail below).


Directory Services DB (DSDB)

The code that implements the main AD database is located in source4/dsdb. The dsdb directory contains the following notable components:

  • LDB modules The LDB library provides a generic framework where custom plug-in modules can be added to modify the database’s behaviour. DSDB uses the LDB library framework and defines its own set of plug-in modules (located in dsdb/samdb/ldb_modules) that are specific to Active Directory. The result is a database that provides the full AD semantics.
  • Schema handling The sam.ldb database follows and conforms to the AD schema. The handling for loading and using the full AD schema is located in source4/dsdb/schema.
  • Replication handling (part) Some of the code related to handling AD’s DRS replication is located in source4/dsdb/repl.
  • KCC The Knowledge Consistency Checker (KCC) is a process that ensures that a valid replication graph is maintained and other periodic cleanup work is done. Parts of the implementation are located in source4/dsdb/kcc, mostly for historical reasons. Other KCC handling is also located in python/samba/kcc.



Infrastructure components

The source-code tree contains the following components that are used to build and test Samba.

  • Selftest is a bespoke framework for unit and integration testing. The tests themselves are located in many different parts of the source tree.
  • Wintest is a system that sits outside Samba’s selftest. Wintest builds and installs Samba and runs some limited testing against Windows automatically. Note that this system is not currently maintained and in-use.
  • Build system. the code in buildtools uses the wscript files in each directory of the source tree in order to build Samba.
  • Documentation. Samba’s manpages are constructed from XML and are located in the docs-xml directory. In particular the smb.conf manpage is constructed from a whole sub-directory of files in here.
  • Note that the internal list of valid parameters in Samba is created from the XML documentation of each configuration parameter, ensuring the code and documentation is always consistent. Documented defaults are also checked for consistency in the automated test-suites.


Autogenerated code

Note that significant amounts of Samba’s codebase is autogenerated from IDL (Interface Definition Language) files. This code is spread across source-code tree (i.e. source3, source4, and top-level libraries).

PIDL generates pull (serialize, or pack) and push (deserialize, or unpack) functions for all the structures described using IDL, and structures marked [public] are exposed in public functions in C and Python. This is very helpful for parsing not just DCE/RPC packets but any other regularly structured buffer. The IDL files are located:

For complex structures that don’t quite fit into IDL, a marker [nopull], [nopush], or [noprint] can be specified. Hand-written parsers can then be written to handle these structures. These manual parsers are located in:

See the PIDL page for specifics on PIDL syntax and examples.