How GitLab CI works in Samba

Running remote scripts, displaying the output

Like the Samba build farm of old, GitLab CI is a system for running scripts on remote hosts against a git checkout.

Pipelines

Samba uses a feature called GitLab Pipelines to orchestrate our CI.

In-repo configuration

In Samba's case, the remote script is script/autobuild.py plus some housekeeping before and after. The details is recorded in the .gitlab-ci*.yml files in the Samba tree (so it is maintained with the code).

See also an introduction to setting up GitLab CI

.gitlab-ci-private.yml vs .gitlab-ci.yml

We have two different CI configurations, one using the default name .gitlab-ci.yml (so picked up by default by forks of our repo) and one that we specify in the Common development repo (.gitlab-ci-private.yml)

The .gitlab-ci-private.yml file includes .gitlab-ci.yml to as to avoid duplication.

The motivation here is to use the shared runners where possible as these are provided by gitlab.com at no cost to Samba Team.

Wrapping containers

To get a consistent build environment container images are used, so the scripts described above all run inside a container.

The image used is defined in the .gitlab-ci.yml file.

GitLab CI is best thought of as a fancy way to run commands in containers and report their results.

Docker

GitLab CI uses Docker as the container runtime.

While the container image format can be consumed by and the containers started using other tools, to closely replicate the environment on the runners, use Docker.

A bit like running in a chroot

The way containers are used by GitLab CI is very much akin to downloading a tarball (the image), unpacking it and calling chroot into it (entering the container). Modern container concepts like namespaces etc are used to make it more seamless, but this conceptualization may assist those struggling with the concepts.

On a private VM

To allow us to accept and test code from a broader range of contributors, and to enable scaling at times of peak load, the docker container is started in a private VM using Docker Machine. This applies for both the private and shared (provided by gitlab.com) runners.

Multiple VMs in parallel

Each section in the .gitlab-ci*.yml file is a job, and each job is distributed to an independent VM, allowing execution in parallel.

Samba's GitLab CI architecture

Scale of Samba's use

Based on Gitlab.com's graphs of the pipeline use on our shared development repository:

In 2020 (Jan-Nov) we used started 315 pipelines per month on average
- 171 (54%) succeeded
This may have been around 500,000 minutes of CI per month
- (range 360,000 - 661, 500 depending on how much CI the failed jobs consumed)
Samba's Rackspace bill (Private VMs only) is around $800 USD per month.

Per pipeline usage

Each pipeline is (after recent optimisation):

35 Virtual machines
Around 1 hour each (ideally under an hour due to cloud billing policies)
1200min or 20 hour total elapsed time (1 hour wall clock)

The shared runners we use at GitLab.com are small, but our Cloud VMs are set as 4CPU 8GB for the bigger jobs.

While this can be optimised, assuming everything ran on the same VM specification, this currently means around 12000 VM hours per month, 48000 CPU hours per month.

We are working to ensure jobs are set as interruptible and that we run a compile check first to reduce redundant VM use.

GitLab.com shared runners

For the parts of our CI task that can run on GitLab.com shared runners, these currently run at no expense to the Samba Team.

These are currently free (due to a bug), but it has been announced that this will be capped in the future,

In the future, for the shared runners 1000 CI Minutes is priced at $10 USD

At times of high load (and presumably at the end of the quota when enforced) jobs that can run on the shared runners get run on the Private VMs. This is done by registering the shared tag on our private runners, but only checking for jobs every 20 seconds, so when available GitLab.com wins the race to schedule the job first.

GitLab Gold offer

The Samba Team may in the future take up the offer to be a GitLab Gold for Open Source customer, and this will provide more capacity (50,000 mins per month as at Nov 2020).

Private VMs

Need

Not all of Samba's CI jobs 'fit' in the resources provided by a GitLab.com shared runner. These appear to be 2 CPU machines and in particular Samba requires a ext4 file system for some tests to pass. On a private VM we can assure that is the case.

Rationale

The Samba team has chosen to use a cloud provider and Private (one VM per run) VMs. This is so that we need not totally trust the users who we schedule jobs for (members of the shared developer repository), as the gitlab-runner will terminate the VM at the end of the job, and the VM security (against the host) is assured by the cloud provider for their own security.

This may not be as cost effective as hosting a gitlab-runner on a shared dedicated machine, but has less ongoing risks and maintenance.

Current status

The Samba team provides the private VMs in the Rackspace cloud paid for by the team using donations.

A single host running gitlab-runner is registered to the shared development repo.

That host is configured to autoscale using docker-machine.

Ansible management scripts

The scripts that manage the Bastion host are a set of Ansible roles and were developed by Catalyst.

The script to rebuild the bastion host is a good place to start.

The scripts used to configure and operate this service are available.

These scripts allow a new bastion host to be fully built by just running single script invocation:

gitlab-ci/one-step-rebuild-rackspace.sh

Future CI services

As all the complex parts of Samba's build and test system are still below autobuild, migration to a different CI service in the future or in parallel is quite practical.

For example, in the past there was parallel operation with Travis CI before the team abandoned GitHub.

Not tied to gitlab.com

If needed, private GitLab hosts running the Open Source GitLab CE can interpret the same configuration and operate against the same runners (just without the free shared runners, naturally).

This gives the Samba Team options if gitlab.com hosting becomes a problem for any reason.

CI Cloud Requirements

To aid in the selection of any future cloud provider

To be a suitable provider for Samba's Samba's CI, a cloud must be able to provide:

On the basis of at least 40 parallel jobs (the current limit is 40, this is often reached when doing security work as all jobs are run on the private runners)
- 160 CPUs at peak
- 160 GB RAM at peak
S3 or Google Compute Engine compatible object store is desirable (for caching, not currently available with Rackspace)
Provide the openstack API to launch hosts (current scripts are built around this and Rackspace, each new cloud is non-trivial to set up)
- Docker-machine compatible driver to launch the runners from gitlab-runner
- Ansible compatible drivers to launch the bastion host
- Command-line ability to upload SSH keys to launch the bastion host
- API access available from arbitrary networks.
Billing to an AMEX to allow the SFC to pay for services
- Billing console so we can confirm current level of billing
Maintained host images for (currently) Ubuntu 18.04 to boot from
- Ideally these would be under a stable name or ID but updated with any security updates

Current Use

See Scale of Samba's use above.

Cost/benifit estimate

CI saves significant developer and reviewer time, making it easier for new developer to join the project. Even a single new productive developer (assuming typical developer salaries - not that the Samba team pays these directly) would bring more value than our costs.

However it is important to realise the order of magnitude for what a CI run costs, so as not to extend the runs without good reason.

GitLab.com pricing: 1200mins (one run, assuming everything used a shared runner) is $12 USD per GitLab.com pricing plan.
- Currently we are not charged for GitLab.com shared runners due to a bug. See also the GitLab.com forum post.

Free CI for contributors is a key part of the GitLab offering, so it is unclear what the long term plan is, but CI costs are real and borne by someone eventually.

Rackspace pricing: starting 35 VMs in Rackspace currently costs $16.8 USD. Thankfully most jobs start in the free (to us) shared runners and we could use cheaper VMs.

Finally, remember that ultimately no matter who pays the financial costs, the resources used to buy the hardware, produce the electricity and the waste heat generated all impacts on our planet.

Future Cloud: Kubernetes?

If we are willing to put in more effort than just a like-for-like port of the existing rig, we should consider if the native Gitlab Kubernetes integration would allow less maintenance of the script infrastructure.

GitLab moving away from docker-machine

There is an open GitLab ticket to Migrate away from Docker Machine for autoscaling which might change things in the future. Currently we pin to an old unsupported docker-machine in any case.

Anonymous

Search

Samba CI on gitlab/Under the hood

Namespaces

More

Page actions

Contents

How GitLab CI works in Samba

Running remote scripts, displaying the output

Pipelines

In-repo configuration

.gitlab-ci-private.yml vs .gitlab-ci.yml

Wrapping containers

Docker

A bit like running in a chroot

On a private VM

Multiple VMs in parallel

Samba's GitLab CI architecture

Scale of Samba's use

Per pipeline usage

GitLab.com shared runners

GitLab Gold offer

Private VMs

Need

Rationale

Current status

Ansible management scripts

Future CI services

Not tied to gitlab.com

CI Cloud Requirements

Current Use

Cost/benifit estimate

Future Cloud: Kubernetes?

GitLab moving away from docker-machine

Navigation

Navigation

Wiki tools

Wiki tools

Anonymous

Search

Samba CI on gitlab/Under the hood

How GitLab CI works in Samba

Running remote scripts, displaying the output

Pipelines

In-repo configuration

.gitlab-ci-private.yml vs .gitlab-ci.yml

Wrapping containers

Docker

A bit like running in a chroot

On a private VM

Multiple VMs in parallel

Samba's GitLab CI architecture

Scale of Samba's use

Per pipeline usage

GitLab.com shared runners

GitLab Gold offer

Private VMs

Need

Rationale

Current status

Ansible management scripts

Future CI services

Not tied to gitlab.com

CI Cloud Requirements

Current Use

Cost/benifit estimate

Future Cloud: Kubernetes?

GitLab moving away from docker-machine

Navigation

Wiki tools

Page tools