Samba CI on gitlab/Debugging CI failures

From SambaWiki
Revision as of 01:16, 12 March 2019 by Abartlet (talk | contribs) (Add a common error)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Using Docker to debug GitLab CI falilures

GitCab CI uses Docker to provides a way to run applications securely isolated in a container, packaged with all its dependencies and libraries.

To install docker on Ubuntu, follow the instructions on this page:

Install docker and run:

docker run -ti /bin/bash

Then you need to clone the samba git repository with your changes and run the test.

You can find how to run it in the log of the failed pipeline.

A more complete example for cloning your samba git repositry would be to run, from your samba source dir:

docker run -ti --mount type=bind,source="$(pwd)",target=/src,ro /bin/bash

And then from within the docker session:

git clone /src
cd src

... you know the drill

Finding the right autobuild command to run in docker

The .gitlab-ci.yml file lists the autobuild command string that is run, matching the split of jobs in the GitLab pipeline GUI, where the command is also printed (in green) at the top of the log. This makes it fairly easy to copy the command into the docker shell. For example:

script/ samba-none-env    --verbose --nocleanup --keeplogs --tail --testbase /tmp/samba-testbase

Other container tooling

The container images stored in our registry and used for CI can be consumed by and the containers started using other tools like podman, but to closely replicate the environment on the runners, use Docker.

make test

Many issues shown up in CI reproduce without difficulty by running the individual test.

  • Most of these issues will reproduce locally on your normal development system
  • Otherwise you may need to use the docker container described above (which has the reference set of packages)

Build Samba with

./configure --enable-developer
make -j

And run the test with

make test TESTS=mytest

Getting patches back out of a Docker session

If you have made changes inside a docker runtime container:

Tell git who you are

git config --global "Fred Nurk"
git config --global ""

Make a proper commit within the container runtime

git add --patch
git commit -s -m 'My commit message'

Export the patch back to your host

docker exec [CONTAINER_ID] sh -c 'cd samba;git format-patch -1 --stdout' > /tmp/patch.txt

You can typically find the [CONTAINER ID] as the part after the @ in shell prompt:


Points to note

Notable Pipeline error conditions

fatal: reference is not a tree

If a branch is pushed to twice in quick succession, the already started CI pipeline may fail with errors like:

fatal: reference is not a tree: f27116a9a0d047629d074bc14c18caf6139731e2

This just means that the runner lost the race with your new push and could not get the old git hash. Your new CI run is in another pipeline.

Resource limitations

The 'private' runners are 4 CPU virtual machines with 8GB of ram. These run in Rackspace's cloud and are paid for from a credit with RackSpace by the Samba Team.

The 'shared' runners are 1 CPU virtual machines with 4GB of RAM. The name is a misnomer, they are not shared VMs, but access to the newly booted VMs is shared to us (and paid for) by

Some tests fail or flap on GitLab CI due to resource limitations. This can cause

  • Docker failure code 137 (likely a kill -9 due to the out of memory killer running)
  • Tests failure because they do not run fast enough (timeouts or failures due to timing)
  • Race conditions (AD schema and DRS replication are particularly prone to this)

Tests should be re-worked to be more memory efficient, more robust to poor CPU scheduling and race-free, but in the meantime this is worth being aware of.

Long hostnames

sn-devel is a nice short hostname, so is laptop etc. Specifically they are less than 14 characters, so do not need to be truncated.

Due to the way the GitLab CI instances are booted under docker, they get long hostnames like runner-191a8437-project-6378020-concurrent-0, which sometimes cause difficult to diagnose issues if not always overridden in the test.