Makefile performance: recursive make

Written by: Electric Bee

Are you using the best method for invoking submakes in a recursive make based build? A simple change can turn on the afterburners by unlocking parallelism.

The central feature of a build system based on recursive make is, of course, the use of submakes to process subcomponents of the build. Typically you'll use one submake invocation per subdirectory in the build tree. All you need to do is run $(MAKE) -C subdir and you're off and running. That's fine if you have just one subdirectory, but what if you have several subdirectories to make in? What's the right way to set it up for optimum performance?

One rule to ... rule them all, and in the darkness bind them

One obvious approach is something like this:

	@$(MAKE) -C partA
        @$(MAKE) -C partB
        @$(MAKE) -C partC

Or the slightly more sophisticated shell-loop variant:

	@for n in partA partB partC ... ; do 
                $(MAKE) -C $$n ; 

Both of these are straightforward to implement and easy to understand -- and each torpedoes your parallel build performance. Here's why: when you run your build with gmake -j N , gmake will run up to N jobs (targets) simultaneously, limited by the number of currently runnable jobs. When one job finishes, another will be dispatched to take its place (again, limited by the number of runnable jobs). When a submake is invoked, an elaborate jobserver system is used to coordinate the parent make and the submake so that the submake can run jobs in parallel too, as long as the total number of jobs across all makes does not exceed N.
Sounds great, and it is -- until you hit a construct like one of these. Then your massive parallelism comes to a screeching halt as you dribble through each submake one at a time. Why? In these makefiles, there is only one job; and, gmake doesn't run the next command in the command script until the current command finishes, of course. That means that $(MAKE) -C partA has to run to completion before $(MAKE) -C partB will even start. Sure, you get parallelism within partA, but if partA is small, there may not be enough work there to really get the parallelism you were hoping for. Consider this contrived example:


	@$(MAKE) -C partA
	@$(MAKE) -C partB
	@$(MAKE) -C partC

partA/Makefile, partB/Makefile, partC/Makefile

all: j1 j2 j3 j4 j5

j1 j2 j3 j4 j5:
	sleep 3

If you run this build with gmake -j 20 , you might expect the entire thing to take no more than about 3 seconds -- there's fifteen total jobs that do "work", and they are all parallelizable. In reality, thanks to the limitation I've described, this build runs for 9 seconds -- three times as long.
Fortunately, there is a better way.

Let your make do the walking

The ideal way to setup your recursive makes is to use separate target for each submake:


.PHONY: partA partB partC

all: partA partB partC

partA partB partC:
	@$(MAKE) -C $@

By creating a separate target for each submake, we've moved the management of the submakes from the shell to gmake itself, and thus enabled gmake to run all of the submakes in parallel. If we run this build with gmake -j 20 , the build completes in about 3 seconds, as expected. Note that the functionality of the system is unchanged: we will still recurse into each of three directories and run the jobs in each as before. A serial build will even produce identical output, processing each directory in the same order as the original. As an added benefit, now we can invoke just one component of the build directly from the command-line, where before we had no choice but to run the entire mess.
There is one minor drawback to this approach: if there are dependencies between the submakes, you must now explicitly declare them. For example, if anything in partB depends on something from partA, you must add the dependency:

partB: partA

By breaking the submakes into separate targets we've removed the implicit serialization between them (caused by the way gmake processes commands in a makefile). To account for that we add explicit dependencies where necessary. That's just good makefile hygiene anyway though. Note that if you are using CloudBees Accelerator, you can skip adding the explicit dependencies and instead rely on Accelerator's dependency discovery; that's not only easier, but gives you even better performance.

This article is one of several looking at different aspects of makefile performance. If you liked this article, you may enjoy the others in the series:

Stay up to date

We'll never share your email address and you can opt out at any time, we promise.