Untangling Parallel Build Logs

Written by: Electric Bee

I spend most of my time with CloudBees Accelerator working on the "big" features -- performance, scalability, fault-tolerance. It's easy to forget that there are a ton of "little" features that can themselves make a big difference in the value of the system. Case in point: the build log. If you have any experience with parallel build systems, you know what a mess the build log becomes because you have any number of parallel commands all dumping output to a single logfile simultaneously. The output from each command gets interleaved with the output from other commands. Worse, the error messages get jumbled up too, so it becomes difficult to tell which commands are producing the errors.
I was reminded of this issue when it popped up again on the GNU make-help mailing list. Take a look at this recent post :

In parallel mode (with -j option), the outputs from different rules are intermixed. I'm wondering if there is a way to order the outputs as if make is run in serial mode, but it should still achieve the speed of parallel mode.

Unfortunately, as this poster discovered, there is no really good solution to this problem, at least not unless there is help from the build tool itself. Some people have tried tricks like colorizing the output by piping each command through a filter that sets the font color before printing the output, but any such solution is going to be unreliable because it doesn't address the fundamental problem of having multiple processes trying to write output simultaneously. Plus it's a real nuisance to modify your makefiles to add the filter.

How CloudBees Accelerator Does Build Logs

Seeing that message on make-help reminded me that I take for granted the way that Accelerator handles the build log. Simply put, Accelerator emits the build log in correct serial order, without any interleaving of output from different commands -- even though those commands are running in parallel and frequently out-of-order. The log you get at the end of the build is identical to the log you would have gotten if you'd run a single-threaded, serialized build.
Think about that for a second: the build log is emitted in serial order. No more scrambled build logs. No more kludgy partial solutions, because we solved the problem at the level of the build tool. You just get consistent, predictable, useful logs. Every time.
Consider this trivial makefile:

  all: a b c

  a b c:
          @for n in 1 2 3 4 ; do echo $@-$$n && sleep 1 ; done

Now compare the output when run by regular old serial gmake with the output when run by parallel gmake -j 4 , and output from Accelerator (emake) (red text indicates places where the output differs from serial gmake, bold text indicates places where the output differs from the previous run of the same make variant):

serialparallel gmakeparallel gmakeparallel gmakeemakeemakeemake
a-1 a-2 a-3 a-4 b-1 b-2 b-3 b-4 c-1 c-2 c-3 c-4a-1 b-1 c-1 b-2 a-2 c-2 a-3 b-3 c-3 b-4 a-4 c-4a-1 b-1 c-1 a-2 b-2 c-2 a-3 b-3 c-3 a-4 b-4 c-4a-1 c-1 b-1 a-2 c-2 b-2 a-3 c-3 b-3 a-4 c-4 b-4a-1 a-2 a-3 a-4 b-1 b-2 b-3 b-4 c-1 c-2 c-3 c-4a-1 a-2 a-3 a-4 b-1 b-2 b-3 b-4 c-1 c-2 c-3 c-4a-1 a-2 a-3 a-4 b-1 b-2 b-3 b-4 c-1 c-2 c-3 c-4

The parallel gmake output is different from serial gmake on every run, and what's worse, the parallel runs are not even consistent with each other from one run to the next! On the other hand, the output from Accelerator is identical to serial gmake, every time.

Anything we can do, we can do better

Of course we're never satisfied with simply solving a problem that plagues users around the world. We want to provide a solution that is so far superior to the alternatives that you'd be crazy not to want it. Therefore, we created annotated build logs , or simply annotation , in late 2003. Annotation is a version of your build log marked up with XML to provide bundles of information above and beyond the standard build output log, in a format that is easily searched and manipulated. This is the solution that everybody wishes they had , and the solution that the developers of other parallel make tools wish they had thought of first. Here's a bit of the annotation for our example build:

  <job file="Makefile" name="a" neededby="J01513660" id="J01511fb0">
  <argv>for n in 1 2 3 4 ; do echo a-$n && sleep 1 ; done
  </argv>
  <output src="prog">a-1
  a-2
  a-3
  a-4
  </output>
  </job>
  <job file="Makefile" name="b" neededby="J01513660" id="J01511fb8">
  <argv>for n in 1 2 3 4 ; do echo b-$n && sleep 1 ; done
  </argv>
  <output src="prog">b-1
  b-2
  b-3
  b-4
  </output>
  </job>

There are a few things in particular that I want to point out in this snippet:

  1. Every job in the build is reported separately from every other job, with the commands and output from each easily identifiable.

  2. You can generate the traditional build output log from annotation simply by concatenating the text in the <output> tags.

  3. Annotation includes even those commands that are not echoed to the standard build output log because of the use of the "silent" prefix (@) in the Makefile!

If you are comfortable with XML you can probably already see the benefits here: you could easily put together some scripts with Perl or even just with xmlgrep and xmlfmt that search the content of <output> tags for things that look like warnings or errors, then reports precisely which job produced that output. Alternatively, you can use our annolib library to build that script:
#!/usr/bin/tclsh
load annolib.so
set anno
set xml
$anno load $xml
close $xml
$anno jobiterbegin
while { } {
set job
set match false
foreach commandBlock {
foreach {lineNumber argv outputs} $commandBlock {
foreach {src text} $outputs {
if { } {
set match true
}
}
}
}
if { $match } {
puts "Problem found in job $job ():"
foreach commandBlock {
foreach {lineNumber argv outputs} $commandBlock {
puts "t$argv"
foreach {src text} $outputs {
if { $src == "prog" } {
puts "t$text"
}
}
}
}
}
}
Just like that, you have a simple script you can use as a starting point for your own post-build reporting on errors and warnings in your build.
So with CloudBees Accelerator, you get the best of both worlds: fast parallel builds, and build logs that you can actually use. Just another way that Accelerator rocks.

Stay up to date

We'll never share your email address and you can opt out at any time, we promise.