CloudBees Accelerator annotation files are a fantastic way to get a grip on your build behavior and performance, but what if your Build (capital B) spans more than one invocation of emake? Annotation gives you a good look inside any single invocation, but there's no way to get an overview of the entire process. You can't just catenate the annotation files from subsequent emake runs -- the result won't be well-formed XML, and the timing information for jobs in each subsection of the build will reflect time from the start of that subsection, not from the start of the logical build. Plus, you run the risk of having overlapping job identifiers in different subsections. What you need is a specialized version of cat that is annotation-aware. In this article I'll introduce annocat , a simple Perl script I wrote for just this purpose, and I'll explain how it works.
What does annocat do?
Annocat has a single purpose: concatenate a series of annotation files from real emake invocations into one annotation file representing a single logical build. In order to do this correctly we have to do a few transformations on the original data:
Job identifiers from each source file are rewritten so they are scoped by the build identifier of the build containing that job. For example, if we have a job with identifier J0830fdc0 in build 12345, annocat will replace that identifier with J123450830fdc0, not only in the <job> tag but also anywhere else the identifier appears in annotation, such as in the <waitingJobs>_ tag. This transformation ensures that we don't have collisions between job identifiers in different source builds.
Timing information is adjusted so that it is relative to the start of the logical build, rather than relative to the start of any individual actual build.
Environment and properties blocks are discarded from all but the first real build. The metrics block is discarded entirely.
Additional pseudo-jobs are created in the result to tie all the real builds together into a single logical build. The logical build represents the actual builds as submakes spawned from a series of serialized jobs in a synthentic make instance.
How do I use annocat ?
Annocat works like the standard cat utility, but on annotation files. Download it here , then invoke it like this:
perl annocat.pl build_1234.xml build_1235.xml build_1235.xml > combined.xml
After running annocat, you'll can load the result in ElectricInsight and run all your favorite reports.
How does annocat work?
Annocat is a simple Perl script that uses the standard Perl streaming XML parser XML::Parser to process the annotation file one tag at a time. I chose the streaming parser because annocat does not need to track a lot of state, so we don't need the sophistication of a DOM-style parser. I chose to implement annocat in Perl rather than Tcl-and-annolib because I wanted to show how you might work with annotation data in Perl; because you don't need all the power of annolib for this simple task; and because I wanted to remind myself how much I dislike Perl.
The basic premise of annocat is simple: as each tag is read from a source annotation file, annocat checks the type of the tag and performs any required transformations, then prints the tag to standard out. Although it's straightforward stuff, the final script is a few hundred lines of code, so I won't go through it line-by-line here. I will point out a couple of tricky bits, however.
First is the bit where annocat adjusts timing information. A global variable, gElapsed tracks the elapsed time as of the start of the annotation file currently being processed. This variable starts at zero and is updated after each annotation file is completed. You can see the update in the main loop of the program, around line 287. When annocat emits the timing data for a job, around line 153, it just adds the elapsed time to the real timing data extracted from the annotation file, thereby shifting the logical start time by the required amount.
Second is the bit where annocat emits pseudo-jobs to provide the top-level structure of the logical build. The tricky part is determining when to emit these jobs, and making sure that they are emitted in the correct context -- that is, in keeping with the annotation format, the rule job that spawns a submake must immediately preceed the opening <make> tag for the submake, and the rule job should list the parse job of the submake as a waitingJob. Since we're using a streaming parser, by the time we get to that parse job, unfortunately, we will already have processed and emitted the opening <make> tag. So we have to do something a little more clever around the start of each build: instead of blindly copying the first <make> tag for the build to standard out, annocat buffers that tag temporarily, until it gets to the first <job> tag in that build. Then we have the information we need to create the fake rule job, so we do so, and only then do we emit the buffered <make> tag. Switching into buffered mode occurs around line 126, when annocat detects that it has found a new <build> tag; emitting the fake rule job occurs around line 181, when annocat detects the first job in the new build. Around that line you'll also see where annocat outputs the follow job for the previous build.
Although it is functional as currently implemented, there's more that could be done with annocat . First, it would be nice if it didn't just dump the environment data from the second and subsequent real build. One way to handle it would be to move it under the first make instance in the build and using environment-level annotation format to capture the deltas between the new environment and the environment for the first build. Second, it would be nice if annocat didn't drop the data in the <metrics> sections. One solution would be to aggregate the metrics from all source builds and emit a single unified block of metrics for the logical build. I leave these enhancements as an exercise for the reader.