Most modern development teams have set up some form of build automation. At minimum, build automation involves check out from source control, compilation and linking, and packaging the resulting binaries and other necessary files. These packages can be then deployed and tested in downstream processes. If done correctly, build automation reduces manual labor by developers, ensures builds are consistent and complete, and improves product quality.
For over a decade, Electric Cloud has helped simplify and automate some of the world’s biggest and most complex build systems, at companies like Qualcomm, SpaceX, Cisco, GE, Gap, and E*TRADE. Our experience taught us of three critical problems with build automation, which are experienced by developers and build managers across a wide range of industries, and that have critical impact on software quality and release times.
In this page we’ll share some insight about these problems – that often go unresolved – why they matter, quick fixes that you could implement immediately, and “heavyweight solutions” used in the industry to address these issues.
The more components you add to your software, the more lines of code you maintain, and the more tests and routines you run as part of your build process- the longer the build will take to run. Also, some development environments and technologies, like C/C++, tend to be correlated with longer builds.
In Agile work environments, builds are expected to be run frequently and the organization depends on build output to guide development work. Long builds can be a big problem, and one that can “creep up” on a development team.
Some builds are so long they can only be ran nightly. But even if your “normal” build takes just 20 or 30 minutes more than it should – there are significant costs to your organization:
Developer wait time – as we shown in a survey we conducted last year, most developers spend between 2-10 hours per week waiting for builds to complete – more often than not, this is lost time.
Context switch – while the build runs, developers switch away from the current task to other ones. If the build is long, developers lose context related to the problem they were working on, which reduces productivity. In other words, if builds are supremely fast, developers never have to stop thinking about the problem they are working on right now.
Build bottleneck – during intensive development phases such as “integration storms”, developers sync changes into the main code line and frequent builds are needed to get everything working. The less frequent the build, the less problems dev can solve every workday.
Product quality – when a developer commits changes that break something, it will only be discovered after the build runs. When the bug is fixed, again there is a lag waiting for the next build before QA can verify the fix. The less frequent the build, the less issues can be fixed and verified before the release – hurting product quality.
Sometimes, individual builds run relatively quickly. But in some organizations, there could be dozens or hundreds of builds run in each dev/test cycle. This could be because numerous teams (sometimes thousands of developers) developing different software components, and each running its own build. Or, the dev team might need to deliver numerous versions of the software for different platforms, customized builds for different customers, etc. Some organizations have relatively short builds– but find themselves needing to support dozens, if not hundreds, of these builds at any given time.
If you’re a build engineer responsible for running 500 builds, you’ll feel the pain even if each of them is 10 minutes long (=~83 hours build time without parallelization). While builds may be short, cumulatively they take very long to run.
But if you’re a developer or QA engineer using such as build system, and your build takes only 10 minutes, why should you care?
Limited build resources – because the organization is running large numbers of builds, you’ll find you have limited access to build servers during specific time windows, or the servers are often overloaded and builds will take much longer.When you rely on running builds often to get fast feedback and fix bugs, you’ll notice that having to “wait in line” for your builds to run hurts your productivity (particularly if you’re practicing Agile), and that you can’t move development fast enough because your waiting for a build server.
Software projects use a large number of modular components: different frameworks, components developed by different teams or by 3rd-party partners, open source libraries, and so on. As your product evolves, there are multiple versions of your own code, and also multiple versions of these many components, creating a many-dimensional matrix of dependencies and supported behaviors. That’s where things get complex to build.
Complex builds reduce the flexibility of the build process and makes it much more difficult to manage and run the build:
Complex builds are brittle – interactions between many different components often lead to manual error, broken builds and worse – builds that run correctly but introduce bugs due to partial or incorrect sources.
Extensive manual efforts – executing a complex build and delivering its results requires a substantial manual effort. Even if the build is automated, it is typically automated in small pieces/components, and there is no orchestration of the entire process.
Incremental builds are difficult - often you’ll want to run a partial build and re-purpose items that haven’t changed and were previously compiled. With complex builds – due to partially specified dependencies – an incremental run could break the build and teams are forced to run the entire build in all scenarios.
Legacy components and fear of change - complex builds tend to have legacy components written years ago by staff members who are no longer at the organization. This impedes changes or optimizations to the build, for fear of possibly breaking legacy components that are not well understood.
Complex builds are long – there is a correlation between the complexity of the build and the time it takes to run, which introduces additional issues as described above.
If your organization experiences at least some of the problems we detailed above, the rest of this page will show you what you can do to solve them:
Some quick fixes to improve build speed:
See step-by-step instructions for several easy-to-implement fixes that can cut build length.
Address long and complex builds:
These are more “heavyweight” solutions that can have a bigger impact on long and complex builds, but require more effort:
CloudBees Accelerator is an alternative to traditional approaches, which can reduce build time by up to 20X without requiring dedicated hardware, and while guaranteeing consistency even in the most complex scenarios.
In this section we’ll discuss a few common solutions to improving build speed, and take you through them step by step.
Note: In our experience the most severe long-build issues occur in C/C++, C#, .NET and related technologies. We’ll focus on two build systems used in these environments: Make and Visual Studio. But the same techniques are applicable to many other build systems.
Faster CPUs, a newer motherboard with improved hard-disk access speed, more or faster memory, can all positively affect build time. Most importantly, the number of CPUs on your build server affects the number of parallel build jobs in a single build.
One way to make builds go faster is to run jobs in parallel on the same machine. If you have a fast build server, or can upgrade it by adding more CPUs, this can significantly reduce build time.
We’ll cover three ways of running builds in parallel:
Using the “make -j” command
Running Visual studio MSBuild projects in parallel
Running MSBuild items in parallel.
A few caveats regarding parallel builds:
Parallel builds are limited to one machine – depending on the number and complexity of the builds you are running, one machine may not be enough. Clustering the build across several physical machines is more complex, and we discuss it below in the section.
Parallel execution can break your build – in some cases, because of dependencies between different parts of the build, or between build projects, parallel builds will not run correctly. The more jobs you run in parallel, the more severe the problem will be. We explain this in more detail in each of the parallelization techniques described.
Run the build as usual, but add the -j operator to the Make command, like this: make -j x , where x is the number of jobs you want to run in parallel. For example, make -j 2 runs two jobs in parallel.
We recommend starting by running 2 jobs in parallel, and see if the build succeeds. Then gradually increase the number of concurrent jobs, and see at which point the build fails. You will not be able to parallelize beyond that point – so use the highest level of jobs that can be ran safely. Also, avoid running more jobs than CPU cores you have available on the build server.
NOTE: Why do builds fail when the number pf parallel jobs increases?
The reason the build might fail when you increase the number of jobs, is that many Makefiles have implicit dependencies. If there is an explicit dependency defined in the Makefile, the -j command takes it into account and only builds a target if its dependencies have already been built(see this Stack Overflow question for more details on how Make -j handles dependencies). But if there are implicit dependencies, make -j behaves unpredictably. For example, if in the original Makefile you built some header files, and then built some objects which include these header files, but you did not define this dependency in Make, the build will still run properly in serial mode (because the headers are built first and only then you build the objects that depend on them). But then when you run make -j 2, there is a possibility that some of the header files will be built in parallel to the objects that include them. Then, those objects will not have the header files they need, causing the build to fail, or even worse, to appear successful, but have broken sources included in the compilation.This is a simple example – there are much more complex cases, especially with recursive Makefiles, in which it is very difficult to uncover that an implicit dependency actually exists.In our experience, implicit dependencies are very common. This is mainly due to the fact that because a serial build will run with implicit dependencies, developers will typically not bother specifying them. In these cases, the dependency doesn’t become apparent until you attempt to start running things in parallel. It takes a lot of time and expertise to untangle a Makefile with any complexity, and explicitly define all the dependencies.If there are implicit dependencies, the parallel build will succeed at times, and will fail at other times – depending on which jobs are randomly selected to run in parallel. The more jobs you run in parallel, the higher the probability that an implicit dependency will be violated and the build will fail, or pass and create a broken output.
If the parallel build fails, or you uncover a problem in the build output, examine the logs. It is recommended to keep a log of a serial build that succeeded, to compare with the log of the unsuccessful build.
NOTE: Make -j writes logs in a different order than the original serial build.
There are several logging options – writing each line as it is executed (which will result in an interleaved log with lines from different modules/components mixed up), or grouping by targets or recursive invocations. Whichever logging option you choose, the order in the log will depend on the randomly-selected order in which build commands run in parallel.This makes comparing the Make -j log to the regular serial log tricky: you will have to isolate the component that caused the problem, find it in the original serial build log, and compare them to see what went wrong.For more details on Make -j, see the GNU Make documentation for
This option is different from Make -j, in the sense that it can run entire builds in parallel on different processors, but does not allow you to parallelize individual build items. So it’s only useful if you’re running several projects at the same time.
In the Visual Studio IDE, on the Tools menu, click Options.
Expand the Projects and Solutions folder, and then select the Build and Run
In the Maximum number of parallel project builds text box, define how many projects will be allowed to run in parallel. This shouldn’t be higher than the number of CPUs on your build server.
Open the solution containing the projects that you want to build.
From the Build menu, select Batch Build.
In the Build column, check the build configurations of the projects you want to build.
NOTE: Dependencies between projects could result in inconsistent builds.
Visual Studio does respect project-to-project references and builds the reference project before the referring project. However, if several projects have a shared reference, that reference is built only once and “cached” for the next times it is referenced. Also, errors and exceptions on one build do not affect the running of other builds, which may depend on the failing build. See the full list of
Click the button for the build action that you want to perform ( Build or Rebuild
The project system performs the multiprocessor build action and displays the build output in the Output window.
This option is similar to Make -j, in that it builds individual targets within the build in parallel on different processors.
In the Visual Studio IDE, open Project Properties.
Select Configuration Properties , then C/C++ General.
Select Multi-Processor Compilation. This specifies that the project should run build items multi-threaded.
Select Projects and Solutions, then VC++ Project Settings
Set Maximum concurrent C++ compilations to the number of processors you have on your machine (or the number of processors you want to run the parallel build on).
NOTE: Visual Studio’s multithreaded compilation flag (/MP) is not compatible with some other build options
, such as incremental compilation and precompiled headers. If the build has a conflicting option, it may not be executed at all on multiple threads.
Now, when you run the build, it will execute in parallel on several CPUs.
Another approach to improving build performance is “build avoidance”, which reduce build times by rebuilding only the pieces that need to be rebuilt, and not the whole code base. Tools like ccache and ClearCase winkins have co-opted the term build avoidance, but they are actually doing object reuse. Object reuse is when you use objects that other people have compiled in order to skip compiling those yourself. Object reuse only works in narrow scenarios, and it can be a headache to manage. A more traditional build avoidance tactic is of an incremental build from the top of the build tree. It basically means finding all the sources in the code that have been modified since the last build, and just recompiling those.
Build avoidance makes a lot of sense because, typically, development focuses on a specific module or modules and not on the entire product, so there are no changes to the rest of the product and incremental builds will run relatively fast. It’s especially useful for builds ran by the developers themselves, who are usually working on one isolated component. Therefore, it’ll be wasteful to build the full project just to test the changes of one component that comprises only a small percentage of the entire code.
Two common tools to help you do build avoidance are Rational ClearCase and the open source CCache. These tools work by looking at timestamps in build outputs. If the build runs and there is a pre-existing .o file, which is older than the corresponding .c file, that means the .o file is out of date and needs to be rebuilt. Otherwise, these tools will leave that .o file in place (“avoiding” recompilation of that part of the software).
A few caveats regarding build avoidance, before we show how to do it:
Not recommended for production builds – In most scenarios we have seen, dev teams did not rely on build avoidance in production. As we discussed in relation to parallel builds, there are implicit dependencies and relationships between Makefiles. In some cases, these dependencies will cause an incremental build to break. If it happens occasionally, this might be acceptable for development or CI builds, but you don’t want to deal with these issues when deploying a release into production.
Incremental builds might be slow, wasteful and unreliable for complex builds
– The larger your build and the more heavily recursive are the Makefiles, the more likely it is that build avoidance will break the build, or the build will succeed but include broken sources. If you know your build to be complex, use build avoidance with caution, and test to make sure that the builds still runs correctly when changing different parts of the project.
CCache is an open source tool that helps with build avoidance, but only if you use the GCC compiler.
Run a full build using your regular compilation command, but prefix the command with the word ccache. CCache will “step in” instead of the regular compiler and run the build, while caching outputs for next time.
The next time you run the build, make sure the results of the previous build are still in the target directory. You will now “re-build” on top of this previous build.
Run your compilation command again, prefixing it with ccache. CCache will now run an incremental build, detecting which build sources haven’t changed and supplying them from the cache, while rebuilding sources that have changed.
If the build was successful, test to make sure it ran correctly. Note that every time you make changes to a different part of the codebase, you will have to re-test to make sure the incremental build did not break anything.
Examine your logs to see the performance improvement with CCache compared to running the full build.
As you probably know, disk access is a major bottleneck in many computing operations, especially builds which require access to files stored on your build server’s hard drive. A simple solution to overcome this bottleneck is to install a piece of software called a “RAM Drive” (available as open source), and perform the entire build operation there. This moves all your build artifacts to RAM, and can significantly reduce build time.
By the way, an alternative to using a RAM disk is running builds on Solid State Disk (SSD) – but of course this requires investing in additional hardware.
A few caveats to running builds on a RAM disk:
Performance improvement will depend on read/write vs. CPU intensive activity
. If there are a large number of files and read/write operations, a RAM disk should yield a big improvement. But in some builds, most of the running time is due to CPU-intensive operations on the same files, without a large number of reads and writes. In this case, a RAM disk will not provide major improvement, and a better option is upgrading to faster CPUs or parallelizing the build across several cores.
A RAM disk is volatile and has limited space – you’ll probably need to copy build sources to the RAM disk every time you want to run a build, and after the build runs, copy the output off the RAM disk to avoid losing it if the machine shuts down, and to conserve space on the disk because memory is limited.
The RAM disk becomes a shared resource – in many organizations there are multiple teams running builds. If you have only one build server, the RAM disk becomes a resource that must be shared/prioritized between the teams. You’ll need to have some sort of scheduling or allocation system, and someone to manage it, because one RAM disk will not be able to accommodate numerous builds running in parallel.
Find a RAM disk program that supports your build server’s environment. You should be able to find one that is freeware or open source. Here is a list of RAM drive software from Wikipedia.
Install and configure the RAM disk.
Copy build sources to the RAM disk and run compilation as usual.
Copy build output off the RAM disk.
For a real-life example of RAM disk usage with a large build, see this article on Code Project (scroll down to Step 3).
In C++ projects, most .cpp files include large headers that do not change from build to build. Instead of needlessly re-compiling these lines of codes with every build, it is possible to pre-compile these headers to save build time.
Precompiled headers work similarly to the technique of build avoidance, described earlier. Header files, which are reused throughout the project are cached, and are reused throughout the build, saving compilation time.
Note that your compiler must provide support for that feature, and not all compilers do.
Choose the headers you want to recompile, and start by compiling these headers using the /Yc compiler option (see documentation of this option in MSDN).
A PCH (precompiled header) file is created. Save it for inclusion in subsequent builds.
In your Makefile, include the PCH files instead of the actual sources. An alternative method is to use the /Yu compiler option (see documentation of this option). Make sure you conform to the consistency rules for using PCH files – most importantly, you must use the PCH files on the same system environment you created them on.
Run the build and test to see that the precompiled headers are included correctly.
See a visual example of a build process with precompiled headers, and a sample Makefile with use of PCH files, provided by MSDN.
The previous quick fixes we discussed are all limited – either because they provide only a small performance improvement, or because they can cause the build to break or run inconsistently.
In this section we’ll quickly review solutions that are more complex to implement, but can provide bigger benefits in terms of build speed or consistency.
This is the obvious solution to long builds – distribute the problem across several physical machines. This is similar running builds in parallel on one machine with Make -j and similar techniques, but here the build tasks are distributed across a cluster of machines, which is more complex. An open source tool typically used to do this is Distcc (note, thought, that it is a limited type of distribution which only distributes compile processes).
Complex setup and investment in hardware – a distributed build requires you to procure and set up several dedicated machines, install the distributed build software on all of them, make sure they communicate with each other correctly, and manage and update this infrastructure over time.
Implicit dependencies can break the build – The same limitations we discussed earlier regarding Make -j apply here: if your build has dependencies that are not explicitly defined in the Makefiles, the parallel build will fail sometimes, if by chance build items are run in parallel to their dependencies. The larger the build cluster, the higher the probability of errors in the build.
Shared drive becomes a bottleneck – in almost all distributed build techniques, there is a shared hard drive that the machines in the cluster use to read and write build artifacts. Very often the same files – for example, header files – are shared and used across many build items, and so are accessed in parallel by several machines in the cluster. We’ve worked with several development teams who have attempted “DIY” distributed builds, and have seen that at some point (usually around 10 machines) the shared disk access becomes a bottleneck and it’s impossible to scale further.
Clock synchronization can break the build – Make depends heavily on timestamps to manage the build process. When running on several machines, small differences in clocks between the machines can lead to a wrong decision. For example, if a target file was built on a machine with a clock that is 2 minutes earlier, another machine might find this file and decide it is “old” and should be built again, resulting in major inconsistencies. This issue can be resolved by precisely synchronizing the clocks of all machines participating in the parallel build.
Node failure breaks the entire build – for example, if one of the machines in the cluster is restarted, or its operating system crashes, the build items belonging to that machine won’t run, and the build will likely fail.
Overhead of invoking jobs – the time taken to invoke jobs (e.g. with ‘rsh’) can become prohibitive as the cluster grows, and will cancel out some of the performance improvement.
Some organizations have taken the extreme step of manually breaking up a build into “components” – a small number of self-contained steps that can be run in parallel on different machines or in different stages of the development process. This is different from distributed builds in which the same build structure runs on multiple machines, resulting in consistency problems. Here, Makefiles are first re-architected to enable them to run separately and still build correctly.
Requires refactoring your entire build process – Makefiles need to be rewritten and re-organized, which can be difficult and error-prone. If you have a large system of legacy Makefile, you will need to analyze exactly how they are built and which dependencies exist, and untangle them – often amounting to a massive undertaking.
Requires refactoring your code base – because the build will now be running in several completely separate components, source code needs to be re-architected as well to make sure there are no includes or dependencies in the code between the different components.
Requires specialized expertise – you will need staff with advanced knowledge in Makefile internals to re-architect the build. This is a special skill set that is not possessed by most developers or build engineers.
Typically yields only a small speedup – because of the difficulty involved, Makefiles will be partitioned into a small number of components, limiting the ability to parallelize or scale up the build.
Requires ongoing maintenance – after you artificially break up the build – and effectively your entire software project – into components, you’ll need to make sure no future changes violate this partitioning. This introduces complexities in ongoing development and requires supervision by the build team to make sure there are no cross-chunk dependencies.
In this solution, the build is kept in one piece, and the Makefiles are rewritten to make them run more efficiently. While this is relatively a smaller effort than partitioning Makefiles (because the overarching structure of Make remains largely the same and refactoring of the codebase isn’t typically required) – Makefile optimization is a “black art” that only few developers have mastered.
To achieve meaningful performance gains, you will need detailed information on what is wrong in the build process, and where exactly do the bottlenecks lie, which requires an in-depth analysis of the Makefiles and their explicit/implicit dependencies. At the outset of the project you will not know the underlying issues or the performance improvements you could achieve. But in some cases Makefile optimization can yield a major performance improvement.
Requires specialized expertise and a major effort – make sure you have someone on staff who is proficient in Makefiles internals and has a lot of time to devote to the project.
You need to know where the problem lies – this is often the most difficult part of Makefile optimization, especially in large and complex builds.
Requires ongoing maintenance – even after Makefiles are optimized and a performance improvement is achieved, the build process will tend to “drift” towards an un-optimized state, as new build artifacts and new Makefiles are introduced. From time to time, the Makefiles will need to be optimized again as new inefficiencies are introduced.
A “Unity Build” is a build that includes all .cpp files into a single compilation. We have seen cases in which this method provided major improvements in build speed.
Unity builds are faster because they avoid reparsing common headers across compilations. For example, if you have two compile steps for “foo.obj” and “bar.obj” and they both include “MyMonsterHeader.h”, and those two compiles happen separately, then your build as a whole ends up reading and (critically) *parsing* MyMonsterHeader.h twice. If you slurp that all into a unity build, and you’ve done everything else correctly, then that header is only read and parsed once, no matter how many source files include it.
Requires refactoring your entire build process – Makefiles need to be rewritten and re-organized to support the unity build structure. For complex builds this can be an enormous effort.
Required refactoring your code base as well – Unity Builds requires changes to actual source files. So if a .cpp file includes a certain header, that header will probably not be in the same place or the same structure. You will need to revise large portions of the code base to make sure nothing breaks in the transition to Unity Build.
A cheap alternative is a RAM disk – Our explanation on how to run your build on a RAM disk addresses the same issue of the I/O bottleneck. While it provides a smaller performance improvement, it is also easier to implement by an order of magnitude.
Also note that Unity builds basically break your ability to do incremental builds or build avoidance, because everything is recompiled in one shot.