Speeding Up Your Docker Based Builds with Codeship
UPDATE: With January 1st, 2017 we rebranded our hosted CI Platform for Docker from “Jet” to what is now known as “Codeship Pro”. Please be aware that the name “Jet” is only being used four our local development CLI tool. The Jet CLI is used to locally debug and test builds for Codeship Pro, as well as to assist with several important tasks like encrypting secure credentials.
“Codeship is awesome, but how can we make our builds faster?”
This is how a lot of conversations start once people have configured Codeship CI/CD with Docker and have their project testing and deploying successfully.
Speeding up your Docker build times isn’t a luxury. It can save your development team hours per day and improve your ability to respond to bugs and productions issues, increasing your customer and stakeholder experience. It also makes your investment into a CI/CD tool like Codeship even more valuable, letting you do more with the service every day.
At Codeship, we wanted to put together some best practices for investigating why your builds might be slow, as well as show you a few ways you can speed them up.
Where Are You Losing Time?
The first thing you want to do when beginning to troubleshoot your build speeds is to find out how long each step in your workflow is taking. Are you losing more time to image downloads, container building, fetching dependencies, or running tests?
Through the online UI, you can examine your logs in detail to see where the big jumps and gaps in your timestamps are.
[caption id="attachment_3375" align="aligncenter" width="1456"]
Logs of an image push[/caption]
[caption id="attachment_3376" align="aligncenter" width="1504"]
Logs of a test output[/caption]
[caption id="attachment_3377" align="aligncenter" width="1418"]
Logs of an image downloading[/caption]
[caption id="attachment_3378" align="aligncenter" width="1096"]
Logs of a cached image being built[/caption]
Once you’ve learned a little bit more about the various steps of your CI/CD process and how long they take, you should be able to lock onto a few paths for potential problems and solutions:
If you’re spending too much time downloading base images, investigate caching, downloading images, building more efficient images and running efficient services.
If building your containers is taking a lot of time, look at improving how you handle dependencies and creating efficient services for CI.
If your tests are taking the bulk of your time, consider parallelizing, build your services with tests in mind, and keep an eye on resource and infrastructure usage.
Caching is one of the most common ways to speed things up. In our case, we’re talking about caching the Docker image build layers created during your CI process. The main benefit of this is that you don’t need to rebuild complex or large images over and over. You can just download and reuse the image from last time if nothing in that image (i.e., your dependencies, codebase or assets) has changed.
It’s important to know that even if some things change (your codebase should, for instance), you don’t need to rerun everything -- only everything dependent on the change. The further down the Dockerfile your volatile content is, the less time your image build will take to rerun.
The first thing you need to do with caching is to enable it. To do that, you add a simple
cached: true directive to your
codeship-services.yml file as seen below for every image you want cached. The next thing you’ll want to do is see if caching is working. Push a build to your repo to kick off a new Codeship build and, once it’s complete, review your logs.
Near the top of your logs, you should see lines displaying that your cached image is being used:
If you see that, then your cache is working. If you see an like this:
Or an error like this:
Then we've got a problem. This means your cache is not working. Let’s take a look at a few reasons why caching often doesn’t work.
When you start using caching, no cache will exist. A build has to be run on every branch and/or on controller to create the initial cache (or cache fallback) image.
The cache was invalidated due to an
COPYhigh up in your Dockerfile, or due to upstream changes in your base image. This will not appear as an error, but rather as though you were not using caching for that part of the image build.
Dockerfile and Dependencies
Now let’s look your application images themselves; that is, your Dockerfiles and specifically how they build your assets and dependencies into your final images. We'll look at a few common reasons why building images from Dockerfiles can take a bit too long, as well as a few possible solutions.
Move complexity from your test scripts into your Dockerfile
Your dependencies may be installing every time when they don’t need to! Typically you will want to create folders, workdir, users, etc., and then include things like
apt-get update before finally installing all your dependencies. Next, run any other commands and add additional files.
The more complexity you can move from your test scripts into your Dockerfile, the more resuable your image is within a CI execution. Ideally you can split things up so that for each command needed to run, the bare minimum number of files are added to support the command. We also recommend grouping as many related commands to together, within individual layers, as possible.
However, if you find too many things are creating layers that are too “volatile” to be cache reliable, you should experiment with separating them out into separate, logical layers. There can be bit of a push-and-pull to find the right balance that works per project.
Install dependencies into a private base image
Additionally, you can consider installing all of your dependencies into a separate, private base image that you pull in and link to your service. This way, compiling your services is not even part of your core base image, and you have fewer monolithic pieces and can optimize separately.
Split your COPY commands
In your Dockerfile, you can also try splitting up your
COPY into several smaller
COPY commands whenever only certain files are needed for something, like a
RUN or a directory prep. You can also make sure that you order these sections in a way that places your most "unstable" files as far down the Dockerfile as possible, minimizing cache invalidation and reducing how many layers need to be rebuilt.
For more information on optimizing your Dockerfile, we have a specific guide right here.
Now we’ll take a look at a few solutions involving your
First, consider using step-specific minimal service files. Let’s say your Rails app requires Redis and Postgres containers. You may not need those dependent services for all of your tests, so running them is going to cost you extra time (over and over again, if they’re not utilized on multiple steps).
As a solution, you could easily define different versions of your services with different links and dependencies. Then swap those in and out throughout your CI pipeline so that each step is not starting a single container it doesn’t need. An example of this would be a “Ruby” service with just the code added for linters to use, and a full “app” service with your database and cache services linked for running tests.
Here’s a simple, high-level example:
app: build: image: myapp dockerfile_path: Dockerfile links: - redis - postgres redis: image: redis:3.0.5 postgres: image: postgres:9.3.6 app_ruby_only: build: image: myapp dockerfile_path: Dockerfile
Since the build payload for both the app and ruby services are identical in this scenario, the image for each service will be built once and shared between them.
The second recommendation for optimizing your
codeship-services.yml file is to consider a separate, private base image with your pre-built dependencies and link it to your main container.
As long as that separate image is being cached and your dependencies are still being installed in your original Dockerfile for redundancy and to catch any changes, this should offload the CI time for testing and building your dependency changes from your application CI process. You can even do this with just some of your dependencies, isolating the most reliable ones and leaving the ones subject-to-change in your main image build, if that use case makes sense for you.
A third way to speed up your builds based on your
codeship-services.yml file is a bit more Codeship-specific. If you’re using fairly popular public base images, you can contact us about adding them to the pre-built CI environment we spin up so that they’re ready to go. We’ve got a couple dozen base images we include by default right now, and we’d be happy to add more if we know there’s demand for the increase in speed it provides.
Finally, another good option is to consider where your builds might be bottlenecking and/or hitting constraints with your infrastructure resources. We strongly recommend coming up with a few different workflows of the same pipeline -- combining parallelization, nested steps, differentiated and united containers -- and running several builds against each. You can then start to figure out what configurations impact your resource usage and therefore slow your build down.
Now, we’ll look at a few build speed optimizations around your
For example, parallelizing can be a great way to speed your builds up. But it also greatly increases your infrastructure usage. If you’re finding that you have slow speeds but aren’t inclined to upgrade to a more powerful instance type, do some quick tests to see if reducing your amount of parallelization actually speeds things up.
As an example of using parallel steps, you can do something like like this:
- type: parallel steps: - service: app command: ./script/ci/ci.parallel spec - service: app command: ./script/ci/ci.parallel plugin - service: app command: ./script/ci/ci.parallel qunit our team that we've released it - type: serial name: master_deployment tag: master steps: - service: deploy command: deploy_me_to_staging - service: deploy command: validate_staging.sh - service: notifications command: notify_team
While parallelizing with Codeship is great, you can also parallelize internally. In most languages, there are packages that allow you to run threads simultaneously, such as
parallel_tests for Rails and
concurrently for NodeJs. This lets you parallelize actions within the existing infrastructure usage, rather than adding on to the infrastructure weight. Note that all optimizations gained here are permanent within your codebase, not specific to a Codeship build or deployment.
Again, if you’re hitting a wall because of how many resources you’re using -- whether because of parallel steps, the amount of things being built, compiled, downloaded, or pushed or even just because of the size of the project -- you can always contact our help desk about it. We'll gladly take a look at just how much of your resources you’re using up and help you plan from there.
If you are interested in learning more about Codeship Jet, check out our Codeship Docker documentation page or walk through our Getting Started Guide. If you want to sign up for a free 14-day Jet trial you can do so here.
Stay up to date
We'll never share your email address and you can opt out at any time, we promise.