In my previous article, "Autoscaling on Complex Telemetry", we discussed a method for determining effective autoscaled cluster size from internal application metrics. That article assumed you wanted to autoscale and so did not discuss under what conditions you might choose to, let alone what your choices for scaling are. Let's go ahead and delve into all that here.
This article will use the autoscaling terminology of Amazon Web Services, but the discussion will be broadly applicable to other cloud providers.
What Is Autoscaling?
Autoscaling is a cluster-scaling technique to ensure that a compute cluster has just enough resources to meet demand, plus some use-case-determined safety margin.
Consider how things were before convenient cloud hosting. Let's say you needed to host a website for what will be a very popular movie. Close to the time of the film's release, load on your servers will be high but will tend to fall off over time. Servers take a long time to get provisioned -- plugged into racks, software installed and configured -- so you have to provision for peak consumption. This is frustrating. Time and money gets spent to decommission machines as traffic falls naturally over time, and even worse, the method is error prone. How do you determine traffic for something that doesn't yet exist? It's inefficient even during the period of peak traffic -- most of these fancy machines will sit idle during the night.
Autoscaling is also a capacity-planning technique that reduces the error from unknown traffic demands. It removes inefficiency by requisitioning computers when needed and decommissioning them when not.
In AWS, this is done by one-hour allotments from a pool of existing machines that may be rapidly flashed to a new machine image. AWS uses the term "group" to describe a cluster of machines. Each "group" is set with a template for machine creation and a definition of the minimum, maximum, and desired number of machines to allow in the group. The desired number of machines will be maintained unless adjusted or the minimum/maximum bounds are violated.
Implicit in this approach is the notion that each computer in the group can be treated as interchangeable and replaceable. This is a significant complication which might well offset the advantage to be gained otherwise from autoscaling. Recent work toward building non-trivial immutable infrastructures points the way to reducing this complication, but it's an active research problem.
A group's desired value is managed by a "scaling plan." At its simplest, the plan is a fixed schedule: "+10 computers at 10AM, -10 computers at 10PM." If a fixed schedule will not do "dynamic scaling," via system telemetry, this might get more complicated.
Scaling Plan Strategies
Being aware of and choosing an effective scaling plan is the trickiest bit of the autoscaling technique. Let's explore some scaling plan approaches and their tradeoffs.
A fixed-size scaling plan is the null plan. No matter what happens, the scaling plan will keep the desired capacity of the autoscale group fixed.
This approach can be useful in the prototype phase of a project when you're unsure of what the practical behavior of your system will be. A fixed-size autoscale group will let you determine per-machine performance characteristics. It's also much simpler to operate in a time when the prototype system likely has many unknowns associated with it.
A production system that rolls out with a null scaling plan is perfectly acceptable. If you are aware that the system's demand will be relatively steady, it's extremely sensible to reduce the operational complexity of the deployed system by avoiding autoscaling.
Similarly, if you believe that the demand on your system will grow only gradually and can tolerate potential under-capacity situations, manually adjusting desired targets is perfectly acceptable.
Autoscaling is both an expense optimization and a safeguard against the unknown. If you don't have concerns in either regard, avoiding the additional complexity is well worth it.
A conceptual step up from fixed-sized scaling is scheduled adjustments in fixed-scale sizes.
Say you're processing sensor telemetry from a factory floor that only runs first and second shifts and the system can keep ahead of telemetry in online operation. This would mean, in a fixed-scaling approach, you'd strictly have more computers than necessary from roughly midnight to 8 a.m. If the number of machines needed to support this factory during its operation are small, then this is not a concern. However, if the number of machines is large, you suddenly have an expense to optimize.
In this case, setting a well-known time for computers to come on and offline is great. Scheduled adjustments suffer the same problem with regard to demand as fixed-size scaling.
Fixed Increment adjustments (simple dynamic scaling)
"Simple dynamic scaling" adjusts the desired target of the autoscale group by some pre-configured amount. This is a break from fixed-size and schedule scaling in that the autoscale group is now a complex component of the overall system design with behavior that must be understood.
Effectively choosing a criteria to drive dynamic scaling is a complex topic, but a simple proxy for system health, CPU load, is often enough to get started.
Say your cluster target is for no more than 50-percent CPU utilization to be used on any machine in the cluster in a five-minute period. This is easy to set up with Amazon's tools. But by how many computers should your cluster increase, or by what percentage should you cluster size increase?
This is where discretion and experimentation comes in. Increase too much, and you'll spend more money than you might have otherwise and increased the cooperation burden of your machines more than you might have otherwise. Increase too little, and your system will fail to meet demand.
It's a challenge when experimenting to avoid over-fitting to a certain pattern of traffic: a 20-percent increase in system demand each day at lunch is much, much different than a 100-percent increase at 8 p.m. because of a promotional deal.
Fixed increment adjustments can be a challenge to get right and must be treated as a control system part and parcel of the developed system. Unfortunately, its limitations with regard to dealing with unusual traffic patterns and lack of fine-grained feedback (configuration is done via a set of threshold rules) means that this approach will never be entirely "set and forget."
This used to be Amazon's sole dynamic scaling option. With the introduction of step scaling, it is now superseded.
Targeted increment adjustments (step scaling)
"Step dynamic scaling" is a superset of simple scaling. With this method, it is possible to set a target criteria for the cluster, measure the difference between the cluster's output and the target criteria, and fire one of a set of rules based on this difference.
An example will be useful. Say, as before, we're scaling on maximum CPU load in the cluster, and we have an autoscale group configured with the following rules:
If CPU > 10%, add 10%
If CPU > 25%, add 50%
If CPU > 50%, add 100%
This cluster will never decrease in size -- there are no rules for dealing with the case where our CPU usage is under target -- but that can be corrected. Here, when we're greater than 10 percent off the CPU target, add 10 percent to the desired size of the autoscale group, etc. The additional rules allow the system to cope with unusual demand automatically.
As discussed earlier, this approach also suffers from a necessary period of experimentation but has a distinct advantage over simple scaling: You can express more complex rule-sets. This does not reduce the burden of adding a new control system into a developed system but does increase, markedly, its utility.
This approach to scaling covers, as well as any other, both cost optimization and adapting to unforeseen traffic patterns. That said, if your system does not need to cover both, either fixed-size or scheduled scaling will reduce the conceptual complexity of your system and make it more predictable.