Zero-Downtime Deployment (The DevOps 2.0 Toolkit)

In most cases, we employ the deployment strategies that result in the current release being replaced with the new one. The old release is stopped, and the new one is deployed in its place. Such a set of actions produces downtime. During a (hopefully) short period neither release is running. As a result, there is a millisecond, second, a minute, or even a longer period during which our service is inaccessible. In today's market, such a strategy is unacceptable. If our software is not operational, our users will go somewhere else. Even if they don't, downtime produces all other kinds of undesirable effect. Money is lost, the support team is overwhelmed with calls, reputation is damaged, and so on. We are expected to be uo and running 24/7. That's the part of the reason why iterations were long in the past. If we are going to have a downtime produced by a deployment of a new release, better not do it often. However, today we cannot afford not to do release software often. Our users expect constant improvements. Even if they don't, we do. Short iterations proved its value on all levels. Today, we see a continuous redefinition of what short means. While, not so long ago, short meant months or weeks, today it means multiple times a day. The ultimate goal? Every commit deployed to production. If deployments produce downtime, the previous sentence could be translated to every commit creates downtime. We don't want that. So, how can we avoid deployment downtime? Or, to put it in other words, how can we accomplish zero-downtime deployments?

Over time, two approaches proved to be the most reliable way to accomplish zero-downtime deployments; blue-green and rolling updates.

In its essence, the idea behind blue-green deployments is that at least one release is running at any given moment. The process is as follows.

We deploy the first release (we'll call it blue) and configure the proxy to redirect all requests to it. When the time comes, we deploy the second release (we'll call it green) in parallel with the previous (blue). At this moment, the proxy is still redirecting all requests to the blue release. Once our newly deployed service is running, we can proceed with automated testing (in production) or any other type of deployment validation. When we are convinced that the new release is not only running but also working as expected, we can reconfigure the proxy to redirect all requests to it. Only once this process is finished, and all previously initiated requests received their responses, we can stop the old (blue) release. With each new release, the same process is be repeated over and over again. The third release would be blue, the fourth, green, and so on.

The idea behind rolling updates is to gradually upgrade the release, one, or a few instances at the time.

Let's say that we have five instances of a service running in production. When deploying the new release, we would replace one instance of the previous release and monitor the outcome during some time. As the result, we would have one instance of the new release and four instances of the old. Later on, if no anomaly is detected, we would repeat the process and end up having two instances running the new release and three instances running the old one. We would continue with the same process until all instances are running the new release. If an anomaly (undesired behavior) is detected, instances running the new release would be stopped, and the old release would be rolled back.

The important thing to note is that both processes assume that the new release is, as a minimum, compatible with the previous. That is most evident in APIs. Since both methods assume that two releases will run at the same time, we cannot guarantee which one will be accessed by a user. The same can be said for databases. Schema changes need to be done in a way that they work with not only the new but also with the current release.

Needless to say, those processes are easiest to implement when architecture is oriented towards microservices. That does not mean that blue-green deployment and rolling updates do not work with other types of architecture. They do. The major difference is that the smaller the service, the faster the process. Also, smaller services require less resources. That applies in particular to the case of blue-green deployments which, during a short period, duplicate resource usage.

The advantage of blue-green deployments is that we can test the deployment before making it available to the general public. On the other hand, rolling updates might be more appropriate if a service is scaled to a large number of instances and running both releases in parallel would put too much demand on resources.

Now that there is no downtime caused by the deployment process, the door opens for the implementation of continuous deployments.

The DevOps 2.0 Toolkit

If you liked this article, you might be interested in The DevOps 2.0 Toolkit: Automating the Continuous Deployment Pipeline with Containerized Microservices book.

The book is about different techniques that help us architect software in a better and more efficient way with microservices packed as immutable containers, tested and deployed continuously to servers that are automatically provisioned with configuration management tools. It's about fast, reliable and continuous deployments with zero-downtime and ability to roll-back. It's about scaling to any number of servers, the design of self-healing systems capable of recuperation from both hardware and software failures and about centralized logging and monitoring of the cluster.

In other words, this book envelops the full microservices development and deployment lifecycle using some of the latest and greatest practices and tools. We'll use Docker, Ansible, Ubuntu, Docker Swarm and Docker Compose, Consul, etcd, Registrator, confd, Jenkins, nginx, and so on. We'll go through many practices and, even more, tools.

The book is available from Amazon (Amazon.com and other worldwide sites) and LeanPub.

Add new comment