Update: We have released a free ebook about our workflow: Efficiency in Development Workflows.
Quick iteration is key to develop a new product and finding the product-market fit. When we started Codeship it was clear that we needed to automate as much as we can. Otherwise we wouldn’t be able to succeed with a small team.
Immutable Infrastructure and Immutable Servers were important for the deployment of our test server infrastructure from the beginning. To be able to terminate and recreate our servers anytime makes our development a lot faster. It also leads to a more stable system for our customers. Listen to this talk with Chad Fowler if you want to learn more about Immutable Infrastructure and Immutable Servers.
Put together this means we can iterate quickly while keeping the server management overhead small.
How we continuously deploy and replace our build servers
Our build servers run on EC2. Whenever we start a new server it automatically connects to our build queue and starts testing our customers' projects. Decoupling the system through a queue, in our case Sidekiq, was an easy way to make our system more resilient to interruptions.
Feature Branches and Pull Requests
Continuous Deployment was part of our development workflow from day one. All parts of our infrastructure need to be deployed automatically.
Our infrastructure currently consists of two parts. The first part, written in Ruby, is responsible for executing your build and deployment steps and stream the corresponding output back into the database. We test this part of our infrastructure with rspec and some shell scripts.
The second part takes care of building the virtual machines that are used by the Ruby scripts to run your build. We use Vagrant to provide a production-like system on every development machine.
Any changes are implemented on a feature branch and merged into controller with a Pull Request. The same workflow we presented in one of our last blog posts.
Building Amazon Machine Images
Whenever a feature branch is merged we run all rspec tests on the Codeship. If they pass we start a new EC2 server and push our setup scripts into it. Through nohup we start a background process on the new server that installs and sets up everything that is necessary to run our build. It takes about an hour right now, but we are working on a quicker way to build our machines.
https://gist.github.com/flomotlik/6534645.js One of the first improvements will be to decouple deployment of our ruby scripts that manage the build and the virtual machine the build runs in. Tools like Docker make this possible, but there are still some issues we need to resolve. Another tool that is great for building virtual machine images and that we will transition to in the future is Packer.
After the setup is complete, we check if everything was set up correctly. To make sure all languages and tools we support are in place we have a Github repository called vm-tester. This repository connects to our supported databases and makes sure all work fine. We run this Github repository on the new server as an integration test.
When the integration tests pass we create a new AMI from the server. We use this AMI to start additional servers when our resources are saturated.
https://gist.github.com/flomotlik/6534751.js Creating the AMI will restart the server. As soon as it is back online it will connect to the queue and start processing builds automatically.
Infrastructure Metrics and Server Management
We started using Librato Metrics a while ago to measure server and application metrics. We measure a lot of different metrics, from the number of currently running builds to successful/failed builds and general timing of different parts of our application.
Librato allows us to mix different app as well as server metrics to get a great overview on our current system status.
[caption id="attachment_402" align="alignnone" width="885"]
A look at our Infrastructure Metrics with Librato Metrics[/caption]
Having all those metrics and the ability to add them to any part of your application is great. Whenever we see a problem with our infrastructure we make sure that there is some way we can measure it and set alarms for unexpected behavior.
Managing our server infrastructure is easy through the tools we’ve built into our admin application. We can disable or terminate any servers we have or start a new one from any AMI we’ve created. This allows us to react very quickly to any changes or problems we might see in our infrastructure.
[caption id="attachment_403" align="alignnone" width="885"]
We can start a new server at any time by clicking the link in our admin application:
[caption id="attachment_400" align="alignnone" width="885"]
Amazon Machine Images[/caption]
Without immutable infrastructure and cloud services this workflow wouldn’t be possible.
Rebuilding our test servers regularly makes sure we start from a new and clean machine. We never have to think about configuration changes done by someone else or possible problems due to long running servers.
Going back to any old version is really only the click of a button. We can innovate quickly while knowing that if something goes wrong the impact to our customers is minimal and can be resolved quickly.
We see our decoupled and immutable infrastructure as one of our biggest assets.
Let us know how you are dealing with changes to your infrastructure and keep your server up to date in the comments.