A fundamental part of our recently-opened-for-beta Hudson as a Service (HaaS) is** automation . Behind the scenes, we employ a number of internal resources that provision and destroy customer environments, dynamically spawn workers and keep track of all of the moving parts. One of the tools we've come to love over these past few months is Opscode's Chef , which allows us to reliably build out our infrastructure images in a repeatable and deterministic manner** .
If you're not already familiar with Chef, and you're in the IT operations business, you owe it to yourself take some time to read about it. The idea behind Chef is that you programmatically describe the desired running state of your infrastructure . In the way we use Chef, each node checks in with a Chef server and gets an operational list of tasks to perform in order to configure the instance as desired. All important users, configuration files, packages and services are declared in such a way at the Chef client, located on each running instance, knows how the instance should be running. Making a change is as simple as pushing a new "recipe" to the Chef server - each client will activate and run that recipe on its next run cycle - every 30 minutes by default.
One simple aspect about Chef that is so appealing is that its extremely flexible and can be used, or abused, to meet your particular needs. Cloudbees uses our Chef setup in three different ways:
First, we use Chef to build machine images. We start with a blank slate image, which is nothing much more than a blank Fedora, Ubuntu or Centos image. From there, we tell Chef that we want to build this image into a certain role X, say, a Hudson controller. Chef runs its recipes and populates the proper data onto the image exactly as we have specified. This includes installing packages, configuration files, starting services, ensuring drives are mounted and more.
When done, we have what we call our "gold controller" of that type of image. When we need to launch a new virtual machine in the cloud, we use these gold controllers to do so - they have all of the software pre-installed, and the configurations ready to go.
Second, we use it to maintain running instances. As we make changes to our software and infrastructure setup, we use Chef to get these changes out to all of our running instances. This allows us to test changes on the fly in test and staging environments, and also push these changes out to production when needed.
And third, we use it to corral information across all of our running instances. The Chef server knows about all of the instances running, and it knows many details about those instances. We can dynamically search and discover information about details of instances and use that information to build out the infrastructure further. For example, node X may want to try and discover other nodes of the same type that are currently running so it can communicate with them. Chef handles this information discovery easily.
With Chef, we have cut down the development time and made our deployments more deterministic. Before Chef, we were using more traditional deployment tools that would run and execute code on the image, constantly deploying over and over again to the same image. Using this method, it doesn't take long for things to get stale , and hard to reproduce from scratch. And if an instance fails, it's not always trivial to get it up and running again. However, with Chef and our "from scratch" deployment methodology, if an instance fails, it's nothing more than a few commands to rebuild, from scratch, everything from the original instance in the exact same state on a brand new instance. With this methodology, all instances become "throw-away." There's no need to worry, because you can always recreate your environments from scratch.
Overall, the real value with Chef is the time savings it provides us in the automation it brings to our infrastructure. As developers, we can focus on product, instead of system administration. And, perhaps most importantly, using Chef gives us confidence that when we make changes to our infrastructure and software, and deploy it, we can ensure it's handled the same way each and every time.