I was pleased to be able to attend the D.C. Jenkins user conference this year, where I gave a talk on the progress of the Workflow plugin suite for Jenkins. One highlight was seeing Jenkins Workflows with Parallel Steps Boosts Productivity and Quality by Tom Canova of ibmchefwatson.com. Naturally the title made me curious: how were people in the field using parallelism in workflows?
The project he works on is a little unusual for someone coming from the software-delivery mindset, since while the ultimate deliverable is still software, what Jenkins is spending most of its time on is running that software (rather than a compiler or automated tests): the result is a summary of a big set of online recipes crunched through some natural language processing into a machine-friendly format. Each “build” is a dry-run of Chef Watson’s preparation for the dinner service, if you will.
Since slicing & dicing all that messy web HTML can take a long time, Tom’s process follows a pretty standard three-stage fork-join model. In the first stage, one Jenkins slave finds a site index with a list of recipes, collecting a list of every recipe to be processed. In the main, second stage, a number of distributed slaves each pick up a subset of recipes, parse them, and dump the JSON result into Cloudant, using a 5Gb heap. Finally all the results are summarized and archived, and some follow-on jobs are triggered (I think in part as a workaround for missing Workflow plugin integrations). All told, the parallelization can cut a twenty-hour build into two hours, giving developers quicker feedback. Doing this from a traditional “freestyle” project would be tough—you would really need to set up a custom grid engine instead of using the Jenkins slave network you already have.
Another unusual aspect of Tom’s setup was that the build history was really curated. Whereas some teams treat Jenkins builds as dispensable records created and then trimmed at a furious rate, here there may only be a few a week, and each one is examined by the developers to see how their changes affected the sample output. (The analysis is put right in the build description.)
One interesting thing the developers do is interactively compare output from one build to another. After all, they want to judge whether their code changes produced reasonable changes in the result, or whether unexpected and unwanted effects arose in real data sets. For this they just do a diff (I think outside Jenkins) between build artifacts. After the talk I suggested to Tom that it would be useful for “someone” to write a Jenkins plugin which displays the diff between matching build artifacts of consecutive builds. This reminded me of something my team started producing when I worked on NetBeans: a readable summary of the changes in major application features from one build to the next.
As a final note, I did try to get some meal advice from the live system. Whether I can convince my wife to let me cook this is another matter:
Basque Red Beet Pasta Salad
1 poblano pepper
½c cranberry juice
1½c crumbled queso blanco
3T achiote paste
5 red beets
3c cubed, peeled butternut squash
3 halved tomatoes
¼c olive oil
½T chopped candied ginger
Hmm. Looks like Jenkins still has its job cut out for it!