Building Linux 2.6 with CloudBees Accelerator and distcc

Written by: Electric Bee

7 min read

Stay connected

There are lots of different parallel, distributed build systems in the world besides CloudBees Accelerator. In this post, I'm going to share my recent experience with one popular alternative, GNU make combined with distcc .

Distcc uses an interesting approach to accelerating builds. It leverages the parallel facilities built into GNU make itself, and adds a distribution mechanism that enables it to take advantage of networked CPU resources. This week I decided to take a look at distcc 3.1, which was released in December 2008. It's been some time since I last tried it, so I figured it was worth some time to see how the project has evolved and how it compares to Accelerator in its latest incarnation.

Setup

For this experiment, I chose to build the bzImage and modules targets of the Linux 2.6 kernel for the following reasons:

  • Including modules , the Linux 2.6 kernel build is fairly substantial -- around 20,000 source files

  • It's freely available, should anybody want to replicate my experiments.

  • It's well-known, so if I have made a foolish error in my tests it will be easier for other people to detect and correct.

  • The build system was deliberately designed to facilitate parallel builds.

I used the following packages in my tests:

  • GNU make 3.79.1

  • Distcc 3.1

  • Linux 2.6.28.1

  • CloudBees Accelerator 4.3.1.25685

Finally, my test hardware consisted of 9 systems configured as follows:

  • Dual Xeon 2.4GHz with hyperthreading enabled

  • 8 systems with 1.5 GB RAM; one system with 2 GB RAM

  • Gigabit Ethernet connections on a dedicated switch

  • RedHat Desktop 3, update 8

Process

After downloading the kernel sources, I unpacked them and used make menuconfig to generate a .config file with all default settings, which I saved for reuse to ensure that each test run used an identical configuration. I wrote a simple driver script for the tests:
gver="`make --version | head -1 | awk '{ print $4 }' | sed -e s/,//`"
lver="2.6.28.1"
targets="bzImage modules"
mkdir gmake-$gver
(
rm -rf "linux-$lver"
tar xjf "linux-$lver.tar.bz2"
cd "linux-$lver"
patch -p0 -i ../"linux-$lver.patch"
for i in 1 2 3 4
do
pfx=../gmake-$gver/gmake$i
make distclean
cp ../"linux-$lver.config" .config
make silentoldconfig
(time make
$targets
) < /dev/null > "$pfx.out" 2>&1
done
echo DONE
) < /dev/null > "gmake-$gver/gtest.out" 2>&1
Attentive readers will have noticed that I'm applying a patch to the kernel sources before running the build. That patch just removes a couple instances of the order-only prerequisite feature in the kernel makefiles, because neither gmake 3.79.1 nor Accelerator 4.3.1 support that feature.

Serial build

Collecting serial build times was completely uneventful. The build ran successfully, albeit slowly, but then that's why we're here, right?

CloudBees Accelerator build

After collecting serial build times, I tweaked the driver script to accomodate building with emake. For these tests, I used the system with 2 GB RAM as both cluster manager and emake host, and configured the remaining systems as agent nodes running three agents each. ElectricMake built the 2.6 kernel out-of-the-box, without any special configuration required. I did run one build to generate a history file, then used that history file for each of the subsequent runs, although for this build, the impact of the history file was negligable (that is, there are very few missing dependencies in the build), which is to be expected given the amount of work put into making the build parallel friendly.

Distcc build

Finally, I retooled the script one more time to accomodate building with distcc. For these runs, I used the system with 2 GB RAM as the build host, and used the remaining systems as distcc servers; I invoked distcc in "pump mode" with gmake -j 16 . The build ran successfully, but I found errors in the build log indicating that "pump mode", the banner feature of distcc 3.x, had been disabled, meaning that the build performance was negatively impacted (error formatted for legibility):

ERROR: compile arch/x86/kernel/asm-offsets.c on blade10,cpp,lzo failed
Warning: remote compilation of 'arch/x86/kernel/asm-offsets.c' failed,
    retried locally and got a different result.
Warning: file 'include/linux/autoconf.h', a dependency of
    arch/x86/kernel/asm-offsets.c, changed during the build
Warning: now using plain distcc, possibly due to inconsistent file system
    changes during build

After some investigation I learned that pump mode does not automatically handle header files that are modified during the build . I applied the prescribed workaround, with the addition of the specific header file mentioned in the error message I received, and tried again... with the same result. After several more iterations, covering one and a half days and including some detailed analysis of the build made possible by Accelerator's file-level annotation, I managed to get distcc working in pump mode. Note that plain distcc worked out-of-the-box; it was only pump mode that gave me trouble (details available on request).
Once I worked out the kinks, I reran the distcc tests with -j 8 and -j 24 ; and I tried including "localhost" as one of the distcc compile servers. At best these changes had no impact on performance; most of them made the build slightly slower.

Results and Analysis

Build toolAverage (4 runs)Standard deviationComparison to serial
Serial gmake32m29.25s4.99s1.0x
Distcc/gmake4m25.75s4.50s7.35x faster
CloudBees Accelerator2m38.00s1.41s12.34x faster

As you can see, both distcc and Accelerator do a good job of accelerating the build, but for raw speed (and, I think, for ease of implementation) Accelerator takes the crown on this one. Why is that so? I think there are two factors that contribute to our success here:

  1. Accelerator distributes all work to the cluster: in addition to compiles, Accelerator distributes code gen, links, packaging, and even makefile parsing to the cluster. This significantly increases the amount of work that can be done in parallel, and reduces the amount of work performed on the build host itself, preventing it from becoming a bottleneck.

  • Accelerator aggressively parallelizes recursive make invocations. In a variety of common situations, gmake does not parallelize recursive makes, even when invoked with -j . For example:

    all:
        $(MAKE) util
        $(MAKE) app1
        $(MAKE) app2
        $(MAKE) app3
    

    In this situation, gmake will not parallelize the work in the recursive makes, and in fact it cannot -- to do so runs the risk of breaking the build. But Accelerator can and does parallelize the recursive makes, and it can do so safely because of our conflict detection and resolution technology. This often gives us an edge over other build systems, particularly if there is substantial work in recursive makes. Of course, since distcc uses gmake to handle parallelization, it is subject to the same limitations.

It's worth mentioning that Accelerator also has an edge in terms of the artifacts left over after the build completes. When the distcc builds finished, I had the build output and the standard build log. When the Accelerator builds finished, I had those same artifacts as well as an Accelerator annotation file, which as you know provides a gold mine of performance and dependency information about the build, which personally I find indispensable.

Conclusion

Although distcc is more capable now than ever before, in the most important measure -- raw speed -- Accelerator still beats it hands down. I think this experiment also underscores the shortcomings of distcc's approach to build acceleration -- without distributing more work to the cluster, I don't know that distcc will ever be able to match Accelerator for speed, at least for large builds that produce multiple outputs. For smaller, simpler builds, who knows -- but then, that's not really the target that Accelerator is aiming for.

Stay up to date

We'll never share your email address and you can opt out at any time, we promise.

Loading form...
Your ad blocker may be blocking functionality on this page. Please disable for an improved experience.