Remote C/C++ compilation using Ccache & Distcc
Here at Zaleos, we are in favor of testing everything in our development process. In one of our larger projects, we are using C/C++, and we have been adding more tools to our CI pipeline to test more and more aspects of the code. We have unit tests, memory leak checkers, code format checkers, integration tests, API tests, documentation tests…
The problems start kicking in
Even though all the testing, in general, is a great thing, it did have a major drawback: our build times on CI sometimes take up to an hour on the longer jobs. This situation is not ideal, so we had to look into a solution, and our CI provider did not scale adequately for our case.
Giving Ccache & Distcc a spin
We were already using Ccache in our build flow, which is a great, simple tool for caching build artifacts. Ccache has helped us a lot, but it isn’t enough to keep the build times acceptably low. We decided to try distcc as a possible mid-term solution to our problem. For those unfamiliar with distcc, it is a tool used to distribute the compilation of C/C++ code across different machines.
Our goal was to use trigger builds on our local machines which would compile on the build farm, and then cache the results, so that other developers or even the CI platform itself could then use the cache to speed up their build process. Distcc can be set up to use ssh to communicate with the build server, to minimize security risks.
With this setup, there are 3 possible build scenarios:
1. The cache is not available on the local or remote Ccache
2. The cache is not available locally but is available on the build server
3. The cache is available locally, so the remote build server is not queried at all
We're going to go over each of these mentioned scenarios, let's get started!
Initial setup
On the build client-side, to use Distcc and Ccache together, we use the CCACHE_PREFIX setting for Ccache, which executes the command passed to it (distcc in this case) before calling the compilation command. This is done with the following line in our build script:
export CCACHE_PREFIX=distcc
There are two ways of connecting to the build client and build server using distcc: by direct TCP connection (usually on port 3632), or by SSH (in which the distcc client transparently connects over SSH and executes a distcc command). The advantage of using direct TCP connections is that the communication is faster, but on the downside, it is not secure. This option should only be used on private, trusted networks, otherwise SSH should be used. The setup we made uses SSH for the communication.
On the build server-side, we want distcc to save the compilation results in cache also, so as to not repeat the same builds over and over if it is avoidable. This is done by using the DISTCC_CMDLIST variable, creating a file with a list of compilers to use, specifying the Ccache wrappers:
[vagrant@buildserver1 ~]$ cat /home/vagrant/distcc_cmdlist.cfg
/usr/lib64/ccache/c++
/usr/lib64/ccache/cc
/usr/lib64/ccache/g++
/usr/lib64/ccache/gcc
Once this file is created, we need to set an environment variable pointing to this file:
export DISTCC_CMDLIST=/home/vagrant/distcc_cmdlist.cfg
The easiest way to do this is to add this line to the .bashrc file of the distcc user on the build server. An alternative is to enable the PermitUserEnvironment setting for SSH (you can check the sshd_config man page for details, but beware, enabling this option does pose a security risk).
We have provided a Vagrant file, that uses Ansible for setting up the remote building configuration. First, you must install the given tools, then clone the repository and start the Vagrant provisioning:
git clone https://github.com/zaleos/post-ccache-distcc
cd post-ccache-distcc
vagrant up
Build path 1 - Cache is not available on client or server
This scenario will happen the first time we run a build for a given set of changes.
Compilation times:
[vagrant@buildclient src]$ time ./build-remote.sh
...
real 0m8.361s
user 0m0.553s
sys 0m0.368s
distccmon-text output (note that this command only outputs the state of distcc at the exact time the command is launched):
distccmon-text
47025 Compile main.cpp 10.22.66.101[0]
Local Ccache:
[vagrant@buildclient src]$ ccache -s
cache directory /home/vagrant/.ccache
primary config /home/vagrant/.ccache/ccache.conf
secondary config (readonly) /etc/ccache.conf
stats updated Mon Oct 18 13:31:57 2021
stats zeroed Mon Oct 18 13:31:49 2021
cache hit (direct) 0
cache hit (preprocessed) 0
cache miss 4
cache hit rate 0.00 %
called for link 5
no input file 2
cleanups performed 0
files in cache 10
cache size 57.3 kB
max cache size 5.0 GB
Remote Ccache:
[vagrant@buildserver1 ~]$ ccache -s
cache directory /home/vagrant/.ccache
primary config /home/vagrant/.ccache/ccache.conf
secondary config (readonly) /etc/ccache.conf
stats updated Mon Oct 18 13:37:43 2021
stats zeroed Mon Oct 18 13:37:06 2021
cache hit (direct) 0
cache hit (preprocessed) 0
cache miss 4
cache hit rate 0.00 %
cleanups performed 0
files in cache 8
cache size 32.8 kB
max cache size 5.0 GB
From this, we can see that the cache hit rate of the build server is 0% - this is normal, as the build hasn't been triggered before.
Build path 2 - Cache is not available on the client but is on the server
This scenario may happen when we have already built a changeset and we have cleared the local cache (for space requirements, for example), or if some other developer starts working on the project.
For our tests, we will have to clear the local build client ccache manually before executing the build script to trigger this scenario
[vagrant@buildclient src]$ ccache -Ccz
Cleared cache
Cleaned cache
Statistics zeroed
[vagrant@buildclient src]$ time ./build-remote.sh
...
real 0m7.600s
user 0m0.532s
sys 0m0.345s
[vagrant@buildclient src]$
Local Ccache:
[vagrant@buildclient src]$ ccache -s
cache directory /home/vagrant/.ccache
primary config /home/vagrant/.ccache/ccache.conf
secondary config (readonly) /etc/ccache.conf
stats updated Mon Oct 18 13:42:15 2021
stats zeroed Mon Oct 18 13:40:59 2021
cache hit (direct) 0
cache hit (preprocessed) 0
cache miss 4
cache hit rate 0.00 %
called for link 5
no input file 2
cleanups performed 0
files in cache 10
cache size 57.3 kB
max cache size 5.0 GB
Remote Ccache:
[vagrant@buildserver1 ~]$ ccache -s
cache directory /home/vagrant/.ccache
primary config /home/vagrant/.ccache/ccache.conf
secondary config (readonly) /etc/ccache.conf
stats updated Mon Oct 18 13:42:14 2021
stats zeroed Mon Oct 18 13:37:06 2021
cache hit (direct) 0
cache hit (preprocessed) 4
cache miss 4
cache hit rate 50.00 %
cleanups performed 0
files in cache 10
cache size 41.0 kB
max cache size 5.0 GB
These results are slightly better than the first scenario, but for larger projects, this difference should be greater. Note that the remote Ccache hit rate has gone up to 50% in this case (the 4 cache misses from the first build and the 4 build hits from the second build are taken into account to generate this hit rate).
Build path 3 - Cache is available on the client
This scenario will happen when a given changeset has already been built and the results are stored in the local Ccache.
The output will be similar to:
[vagrant@buildclient src]$ time ./build-remote.sh
...
real 0m3.334s
user 0m0.322s
sys 0m0.284s
Local Ccache:
[vagrant@buildclient src]$ ccache -s
cache directory /home/vagrant/.ccache
primary config /home/vagrant/.ccache/ccache.conf
secondary config (readonly) /etc/ccache.conf
stats updated Mon Oct 18 13:45:42 2021
stats zeroed Mon Oct 18 13:40:59 2021
cache hit (direct) 4
cache hit (preprocessed) 0
cache miss 4
cache hit rate 50.00 %
called for link 10
no input file 4
cleanups performed 0
files in cache 10
cache size 57.3 kB
max cache size 5.0 GB
Remote Ccache:
[vagrant@buildserver1 ~]$ ccache -s
cache directory /home/vagrant/.ccache
primary config /home/vagrant/.ccache/ccache.conf
secondary config (readonly) /etc/ccache.conf
stats updated Mon Oct 18 13:42:14 2021
stats zeroed Mon Oct 18 13:37:06 2021
cache hit (direct) 0
cache hit (preprocessed) 4
cache miss 4
cache hit rate 50.00 %
cleanups performed 0
files in cache 10
cache size 41.0 kB
max cache size 5.0 GB
As you can see from the output, when the local Ccache is available, the build is faster. Also note that the local Ccache hit rate has gone up, and the remote Ccache remains constant (as the build client has not needed to send any requests to the build server). Again, for larger projects, this really pays off.
Other interesting notes
Checking the build & cache activity
Some useful commands can be used to monitor the status of the distcc client and Ccache:
Show state of remote compilation jobs - this should be run on the build client
[vagrant@buildclient src]$ watch -n 0.5 distccmon-text
Show the state of Ccache usage - can be run on the build client or the build server
[vagrant@buildserver1 ~]$ watch -n 0.5 ccache -s
Additional build servers
If you want to test multiple build servers, you can edit the Vagrantfile to increment the build server counter in this line:
NUM_BUILD_SERVERS = 1 # Add more build servers by increasing this number
And also uncomment and/or add more lines in the build-remote.sh script file:
#init_distcc_build_server 10.22.66.102 16 # buildserver2
Note regarding this function:
What the init_distcc_build_server
function does is populate the DISTCC_HOSTS
environment variable with the ssh address of the build server it is passed, and add the remote server's fingerprint to the list of known hosts. As the second parameter, you pass the maximum number of concurrent build tasks you want to send to this server. DISTCC_HOSTS
will be prefixed with a --randomize
argument, which makes distcc select a host randomly to send the build tasks to. Unfortunately, we haven't found a way to dynamically send more or fewer tasks depending on the remote machine's build load, so the server quickly runs out of resources.
Additional distcc configuration
If for some reason Distcc isn't being called on the build server, here are some environment variables that can be set to disable local compilation and enable verbose debugging messages:
export DISTCC_VERBOSE=1 # Enable verbose debug logs
export DISTCC_FALLBACK=0 # Disable local compilation fallback
export DISTCC_SKIP_LOCAL_RETRY=1 # Disable local compilation retry
Conclusion
Our initial impression with distcc was very positive, and it did allow us to use lower-end build clients to achieve similar results to those of our higher-end setups. Unfortunately, when we did some further testing, such as making different changes in our codebase and triggering various simultaneous builds, simulating what would be our daily workload (developers + Continuous Integration), our test build server ran out of RAM very quickly:
Some of the issues that we have had with this setup are:
- There is no deduplication of the build jobs. If identical build jobs are submitted simultaneously, they will all be processed, even if only one compilation is required.
- Rapidly running out of resources on the build server. When many jobs are submitted by different clients simultaneously, there is no queuing mechanism, so they are all launched at once.
- Only the compilation step is distributed. The preprocessing and the linking happen on the local machine. Note that we're using the direct mode of Ccache, not the preprocessor mode. Distcc does have a way to distribute preprocessing tasks, but unfortunately, it's not compatible with Ccache (see distcc pump mode).
We could have scaled horizontally at this point, adding more build servers to spread the building load. Services such as clouding.io can be used for spinning up cloud servers for this type of task. Unfortunately, the benefits of caching the build artifacts this way would be lost significantly because the cache is not shared between each build server, and sharing a cache might be difficult and was out of the scope of this work, so we discarded this option.
What’s next?
Distcc may be good for some scenarios, like when fewer concurrent builds are required, but for our current use case, it isn’t the best fit.
This is an ongoing task, but the next step we are going to take is to explore Google build tool called Bazel (https://bazel.build), which looks very promising indeed, although it may require more upfront work to set up in our projects. Bazel uses the Remote Execution API, which is a standard API for spreading builds across multiple hosts. It could also bring build consistency for all our products.
Some of the other tools that we considered using are:
- Earthly (https://earthly.dev) - a very nice tool, which is kind of a mix between Makefiles and Docker. We use it in some of our projects, but when we tried to use it with the large C++ project described in this article, we had some issues with cache and testing (we were using experimental features, to be fair), so we discarded it in this case. It is definitely worth checking out to see if it adapts to your use case, though.
- sccache https://github.com/mozilla/sccache - a tool to share the compilation cache (either local or in a cloud storage backend such as S3, Redis, Memcached...).
- Recc (https://gitlab.com/BuildGrid/recc)
- Goma (https://chromium.googlesource.com/infra/goma/server/ and https://chromium.googlesource.com/infra/goma/client/)
- DMUCS (http://dmucs.sourceforge.net)
- Mmode (https://github.com/MedicineYeh/mmode)
- Mold (https://github.com/rui314/mold)
References
- Ccache can be found here: https://ccache.dev
- The distcc tool mentioned in this article can be found here: https://distcc.github.io/
- The Github repo with the example setup is located at https://github.com/zaleos/post-ccache-distcc
Further reading:
- https://developers.redhat.com/blog/2019/05/15/2-tips-to-make-your-c-projects-compile-3-times-faster
- https://wilsonhong.medium.com/distcc-using-ccache-on-distcc-server-6abebe906971
- https://wiki.gentoo.org/wiki/Distcc
Cover image credit goes to Fabio