Electrifying Continuous Delivery: How to Use CloudBees Accelerator to Parallelize Your Testing
Most of our customers know about CloudBees Accelerator. After all, it was the company’s first product; intended to radically speed software builds by parallelizing the recipes in a makefile across any number of agents. Most of our customers have seen dramatic increases in performance, accelerating builds by anywhere from 2x to 20x or more, depending on the structure of their build and how many agents they want to make available to the cluster manager. What most of customers do not know is that Accelerator can do much more than just accelerate builds. It can perform its magic with much more than just makefiles, and the tools do to this are already included in the product and require no additional licensing costs. In this post, I will be discussing some pre-requisites and techniques to accelerate other activities associated with successful Continuous Integration and Delivery, and particularly that missing link in the equation – Continuous Testing.
Review: How Accelerator works
CloudBees Accelerator’s most well known claim to fame is that it can accelerate builds of software products. It does so by providing a replacement for your make tool (gmake, nmake, clearmake, etc.) which can perform its job on its own, or can talk to a cluster manager to “farm out” the individual components of the builds to “agents.” By dividing the work among many agents and performing the work in parallel, it speeds up completion of the work, reduces wait times and provides significant ROI (Return on Investment) in the process. The illustration below depicts the most common configuration and reference architecture of the CloudBees Accelerator product. When CloudBees Accelerator’s emake is processing a makefile, it uses targets and recipes to understand how to parallelize the work. Making those decisions on the basis of a patented algorithm that learns every time it is executed, it queries the cluster manager for available agents compatible with the build. On the basis of the configuration the customer has defined in the cluster manager, and the platform on which the work needs to be done, the work is then sent to agents, which in turn return the results to the cluster manager and to the host where emake is running. The Electric File System takes care of ensuring that all the information an agent needs to do the work is made available from the host, and all the results, including logs and annotation, are sent back. Many other things also take place in the background, but I will not go into those details today.
The concept of an agent is often misunderstood. An agent is not tied to a core in a processor. There can be more agents than cores on a host, or there can be more cores than agents. The choice of configuration depends on how I/O- and CPU-bound the jobs being run on those agents happen to be. Generally, we recommend starting out with a number of agents on a host that matches or slightly exceeds the number of cores. Whether the host is a virtual or physical machine is irrelevant. If the host has 12 cores, you might consider installing 14 agents as a start if the machine will be dedicated to Accelerator builds, but hyperthreading generally produces only 20-30 acceleration. If the host will be shared among other tasks, you might reduce that number.
What if I want to accelerate other activities?
Many customers have asked themselves (and shortly afterwards, us) if the technology in Accelerator can be used to parallelize other work. Most of them are unaware that the answer is a qualified “Yes!” There are, however, a number of caveats to keep in mind. The following information can be used if you want to accelerate existing applications, but can also be used by developers who would like to give customers the option of using their applications in an Accelerator environment.
Repeated invocation of single-process or multithreaded means emake only
An application that runs as a single, monolithic process, or which is multithreaded (such as Java apps) can only be sent to an agent for parallelization as one process. For example, a code pattern analysis application that runs as a single process, or a unit test tool that runs under Java, can be sent to an agent as one of a number of tasks to perform, but it is usually defined in a makefile and handled by emake. If this application is large and takes a long time to finish, it will also take a long time to finish running on an agent. If it only needs to be executed once, there would be no benefit to using Accelerator because you would only be adding the overhead of distribution to the agent, communication, etc. to the task. If however you have a single process or multithreaded application that needs to be run many times, such as a pattern and flow analysis on multiple C# projects that are part of a solution, then you could very well benefit from parallelization. Take for example a Visual Studio C# product that is composed of seven projects, and you need to run Parasoft’s dotTEST against the code in the projects to try and catch any potential problems before releasing the product. Running seven analyses serially could take some time. Let’s assume that the projects reside in sub-folders within the product’s main folder, and you need to walk the folder structure and run the tool against each one. The sample makefile below would achieve that goal:
PROJECTS=utils core library external graphics gui xml TARGETS=$(addprefix go_,$(PROJECTS)) all: $(TARGETS) go_: dottestcli -nobuild -report .\project\$* -project .\project\$*\$*.csproj
If you tell emake to process this makefile from the base folder where the product code resides, it will walk each folder and run the tool on the code, producing a report within each folder. To parallelize this task with emake, you might use a command line like this one.
emake --emake-annoupload=1 --emake-annodetail=basic --emake-cm=--win32
The first two arguments would produce a basic annotation file and send it to the cluster manager. The next argument is where you would put the FQDN or IP of the cluster manager to which you want to send the requests. The last argument tells emake to run in Windows mode. emake will then run all 7 tasks simultaneously on as many agents as the cluster manager says are available.
What if the tool has a multiprocess architecture?
Here is where it gets interesting. There are tools out there, such as Parasoft’s C/C++ test pattern and flow analysis tool, which can be run in single process mode, or can be told to split up its execution into separate processes as it works its way through a C++ project. If you run it as a single process with emake, you get no benefit from that because all you insert is overhead. However, if you tell the tool to split its processing to multiple processes, every time it has to invoke one of the underlying executables to perform some portion of the analyses it will spawn a new process. Now, if you tell emake to handle it, you will get no benefit. Why? Because emake will send the entire run of the tool to an agent, and everything will run on that one agent. Here, however, is where a little-known component of CloudBees Accelerator can come to the rescue. (Pay attention, developers; this is where you can shine with Accelerator.)
Riding a supersonic cloud with Electrify/ElectrifyMon
Electrify and ElectrifyMon are tools that come with the Accelerator distribution, but do not require any additional licensing or fees. Essentially, they give you the ability to send processes to agents for parallelized execution, but lack emake’s ability to make decisions as it learns from history created in previous runs, and do things like figure out dependencies (or the lack thereof) on its own. What Electrify does (through ElectrifyMon) is inject itself into the application to be executed and monitor for the creation of processes. When a new process is created, it compares it with the one or more processes that you have told it can be intercepted and sent to an agent. If there is no match, the process runs as is on the host machine. This is a crucial point you must understand – you must tell Electrify which child processes it should or should not intercept. The reason for this is that by default Electrify intercepts all child process creation and sends it all to the agents. This may cause problems, be of no benefit, or worse yet, slow down the execution. Let us examine the particular example of Parasoft’s C/C++ test product. Parasoft’s cpptestcli tool is the command line interface to begin analysis. In most cases, it will initiate an instance of Java, which then calls cpptestcc , an executable launched once per source code file being analyzed. What we want to do is parallelize the execution of cpptestcc when it runs ipro for analysis of include files, ppro for macro definitions, and cwc for pattern-based code analysis.
The final component, flow analysis, cannot be divided and parallelized because its architecture is multithreaded Java. Because we cannot separate threads from an application, we would not normally send that to an agent. However, because we have no way at this time of specifying that only some runs of cpptestcc can be parallelized, but not others, we parallelize all cpptestcc processes. To execute the parallelization, you might use a command line such as this:
electrify --emake-cm=<your_CM_IP_or_FQDN> --electrify-remote=cpptestcc --emake-annodetail=basic--emake-annoupload=1-- cpptestcli -settings cpptestcli.settings.txt -input cpptestscan.bdf -compiler vc_10_0
- The first argument tells Electrify to which cluster manager it should send the requests.
- The second argument tells Electrify (or more specifically, ElectrifyMon) which processes it should intercept on creation. What we are telling it is that when cpptestcli calls cpptestcc, it should intercept that and send it off to an agent for parallel execution. When it is parallelized, ipro, ppro and cwc will also run on the agent.
- The next two arguments are the same as for emake, for annotation purposes.
- Everything after the two hyphens “--“ is interpreted as the main program it should launch and monitor, and the rest will be passed to that program as command line arguments. I will explain the –settings in a moment.
- The –input tells cpptestcli to run the analysis on the basis of a Build Definition File that was previously created by another tool. The –compiler option tells it that the product is compiled with Visual C/C++ version 10.0.
Now, back to the –settings. This tells cpptestcli to look for cpptestcli.settings.txt for further configuration settings. The contents of my file were:
cpptest.analyzer.flow.multiprocess.analysis=true cpptest.analyzer.flow.multiprocess.analysis.using.socket=true parallel.mode=Manual parallel.max_threads=6
Remember I said that the application to be parallelized has to create its own processes so that Electrify can intercept them and send them to agents? Here is where Parasoft’s engineering team did just that.
- These options tell it to run the analysis in multiprocess mode and to launch a maximum number of six processes at a time. This is because my test setup had a six-agent cluster.
- You can modify this to any number you want, but it should greatly exceed the number of agents available. emake will queue parallelization tasks when there are no agents available, so it’s ok to have a few waiting for an agent in order to minimize latency due to the overhead of the parallelization algorithm.
- When you run this command, the analysis will begin.
In my test case, a serialized analysis took about six minutes to analyze the source code for the open source “putty” product. Using six agents on a mere Asus Intel Core i7 laptop with 8GB of memory, I was able to reduce this down to as little as 29 seconds. That is a whopping 12x acceleration ratio! Imagine what you could do with all that saved time!
Supersonic flight is not for everyone – not even through clouds…
There are more caveats to keep in mind. These techniques may not always work if an application is not designed to specifically take advantage of our technology. There may be interactions and other issues that may interfere with acceleration. For example, at the time this blog was published, the integration with the Parasoft tool was not quite working properly under Linux, and would cause a Java crash with an error mentioning a corrupted heap. To address that, we are using yet another Accelerator technique which I will discuss in a subsequent, separate blog as it is a more technically complex solution. Remember as well that this won’t work with parallelization of multithreaded applications, because we cannot split threads from apps. We can only do that with processes at the operating system level. I will not delve into the details, but suffice to say that it is powerful magick and wickedly cool technology.
…but if you feel the need, the need for speed… Punch it!
Consider this statement of truth – the bane of any Continuous Delivery strategy is testing. You can have wildly fast builds with CloudBees Accelerator , completely automated Continuous Integration and Delivery with Electric Flow , but if your Continuous Testing slows your Agile-ity to a crawl, it’s all for naught, sire. Consider as well that the number one root cause for software failures is insufficient testing , and the number one reason for that is the time and effort it takes to perform full testing. When there are conflicting priorities, corners are cut and test schedules suffer as a result. Test acceleration can work wonders to dispel excuses used to justify those actions. Some of our customers are now literally leaving tracks in the sky, saving multiple person-years and achieving testing and release milestones that were once thought impossible. Shouldn't you?
See it in action!
Shifting Left: Electrify Your Static Analysis To Accelerate Continuous Integration
Come check out my upcoming webinar to see a live demo of our integration with Parasoft and learn how you too can accelerate your tests as part of your CI flow.
Stay up to date
We'll never share your email address and you can opt out at any time, we promise.