You know that feeling when a new feature is deployed to production? It's a strange mix of excitement, relief, and fear. You're excited because you were part of this feature. Relief is a natural reaction to any accomplished task. But why does deploying to production cause fear? Well, because there are so many things that can go wrong at this stage that it's daunting to even think about them. However, not thinking about them is a surefire recipe for trouble.
When it's time to release a new version of your application, you have several options. The simplest one is to take down the old application and replace it with a new one. However, this approach requires downtime, which is not ideal. Alternatively, you can spin up a new instance of the entire application and divert all traffic to it when it's ready. This is called blue/green deployment, and it's a much better strategy because it avoids downtime. But creating an additional instance of an entire application for each deployment can be a slow and expensive process. That's where rolling deployment comes in.
Rolling deployment is good for deploying composite systems. By composite, I mean systems that are composed of several nodes that you can deploy independently. The size of the nodes is not important. They can be as big as dedicated servers running the entire monolithic application or as small as Docker containers running individual microservices. Imagine that this is your application:
You need to deploy a new version. So you start by deploying one additional node containing the new functionality:
Then you reconfigure the system such that the new node replaces one of the old nodes:
Now you effectively have the new version of the application running alongside the old version on a relatively small scale. It's up to you to decide whether to retire the unused old node immediately, or wait and monitor the new node for a while. Your ability to monitor the behavior of the newly added nodes before you decide whether to keep them is akin to a granular canary deployment. Once you're satisfied with the state of the system after node replacement, you can retire the old node that's no longer being used:
You can use this method to replace the remaining old nodes one by one. Eventually, new nodes will replace all the old ones:
At this point, you've completed rolling deployment. The new version of the application is now running across the entire system. As you can see, this technique is relatively complicated in terms of deployment orchestration. Fortunately, there are many deployment automation tools available that can help simplify the process. You're probably asking yourself, "Why would I use rolling deployment when it has all this additional complexity?" Glad you asked. I'm going to answer this question in the next sections.
The aforementioned optional step of monitoring the new node greatly reduces the risks associated with potential bugs. Any bugs or instability in your new node will affect only a fraction of your users. Advanced tools that support rolling deployment can even let you keep all old nodes connected and divert only a small amount of traffic to a new node during the testing stage.
During rolling deployment, you replace the nodes one by one, which can limit the number of users exposed to potential bugs. You can also roll the changes back very quickly because the old node does not get retired until the new node reaches a predefined acceptance criteria. Rolling back is as simple as reverting the traffic redirect. Even if you detect the failure much further into deployment process, advanced tools can automatically roll the changes back by adding new instances of old nodes. Knowing that you can roll the system back at the first sign of a problem gives you peace of mind about confronting any bugs that might slip into production.
Remember how I said that blue/green deployment requires replicating the entire application? This might not be that daunting an issue if your application conveniently fits into a small number of servers, but replicating your application will become a slower and more expensive process as it grows in size. Rolling deployment mitigates this issue because it requires just a single additional node. This caps the instantaneous system overhead during the entire deployment process to the overhead of just a single node. You can even configure advanced deployment tools to perform rolling deployment in-place by retiring old nodes before setting up new ones. This way, you can perform a full deployment with zero overhead. In addition, advanced rolling deployment tools can replace more than one node at a time. The number of nodes replaced at once is called the window size. All these configurable parameters allow you to fine-tune your rolling deployment strategy to your specific needs. This configuration can even be dynamic. For example, if you encounter a severe security bug, you might decide to replace all nodes at once. This will effectively degenerate your rolling deployment into blue/green deployment for the duration of a single hotfix release. You can use dynamic configuration to keep the overall deployment costs low, but you always have the option to trade cost for speed when you need it.
Rolling deployment is a very useful technique, but it does come with some trade-offs. As I already mentioned, deployment orchestration becomes more complex. Deployment automation tools can mitigate this complexity, but even then, rolling deployment often requires a considerable ramp-up. During rolling deployment, the old and new versions of the application effectively run in parallel. To keep the system stable and operational, you must make sure the new nodes are functionally compatible with the old nodes. This backward compatibility can be challenging to get right. The last caveat is not unique to rolling deployment but is important to keep in mind nonetheless. When you deploy with zero downtime, there might be sessions in progress that switch to the new functionality quite abruptly. Some users may not notice, but others might be confused or encounter problems. In some cases, this can even lead to data loss. It's up to you to decide how to handle these corner cases based on the specifics of your application.
Rolling deployment is a great fit for big applications composed of multiple independent nodes. These nodes can be as big as dedicated servers or as small as Docker containers. With rolling deployments, you get better fault tolerance, an ability to perform quick rollbacks, and very granular control over infrastructure overhead. You can also monitor the newly deployed nodes and the overall KPIs to ensure that only high-quality features make it into production. So long as you weigh its benefits against its technical complications, rolling deployment might be the perfect fit for your application.