According to Gartner, the average data center utilization worldwide is around 10 to 15 percent, which isn’t great for resource efficiency. The leaders in resource utilization, Google and Netflix in particular, do a lot better at 50 to 70 percent.
Unfortunately, resource efficiency is probably going to get worse if we don’t do anything about it. Public cloud and automation tools make it easy to over-provision. Often that’s the only way to handle complexity and unpredictable demand (after all, it’s generally better to over-provision than to fall over).
Demand for public cloud is increasing as enterprises migrate to it and new workloads like IoT are added. Estimates put the projected increased demand for data center capacity at between 50 and 300 percent by 2020.
How Can We Improve Resource Efficiency in Tech?
In March 2016, The Economist estimated data centers at 2 percent of global energy usage. For comparison, the aviation industry uses 2 to 2.5 percent of global energy, and people campaign against new runways. Maybe that’s unfair, given aviation is a rather energy-efficient sector. Fuel costs are a large percentage of an aviation business’ costs, and keeping down energy usage is a direct competitive advantage.
Meanwhile in the tech sector, we’re less motivated by cost savings and more motivated by speed of introduction of new features or products. The popular microservices architecture enables autonomous teams to ship features very quickly. However deploying one microservice per VM often reduces server utilization compared to an old-style monolith.
Container orchestration
The good news is that orchestration can improve data center efficiency a lot. That’s exactly how Google and Netflix achieve their better scores.
Containers can be scaled in real time, whereas it takes minutes to add capacity to VMs. So rather than add new machines, you could just repurpose existing infrastructure to focus on temporarily urgent tasks. We call this “microscaling” to differentiate it from traditional autoscaling with VMs. Using microscaling this way, Netflix and Google reduced their energy use and their hosting costs at the same time.
Microscaling can also be used with VM autoscaling to refocus resources to meet urgent demand while buying time for more VM capacity to come online. This potentially makes systems more resilient without excessive over-provisioning.
Microscaling engine
Orchestrators enable you to build useful tools for managing your containerized workloads. All the orchestrators (Docker Swarm, Mesos, Kubernetes, Nomad, ECS) have clean, RESTful APIs that let you look at what’s running across your whole cluster and stop and start tasks.
If your tasks can be stopped and restarted quickly (“cattle, not pets”), then you can potentially decide to stop some tasks that aren't vital to meeting a particular demand peak and use the freed resources to scale up or scale out the services that are vital. This can happen in real time because containers and orchestrators will let you repurpose existing infrastructure in seconds.
Some of the orchestrators have support for microscaling based on CPU and memory. This can be useful, but there are also problems -- a container could be using a lot of CPU because the code is inefficient, for example. Rather than using CPU and memory, it’s better to use metrics linked to customer activity.
To achieve this, you'll need access to real-time demand metrics. CloudWatch and similar services often run minutes behind, which isn't fast enough. Initially, as demand metrics, we’re using queue lengths straight out of live systems like load balancers (NGINX) or queues like SQS or Rabbit.
If the queue length is above the target, we scale up containers and then scale down once the backlog is processed. Any spare capacity can be used for a lower priority task. So far, we support the Docker Remote API and Marathon/Mesos with more orchestrators to come.
You can try this out yourselves with Docker Compose by running our microscaling engine to scale a local NSQ queue. You can find the code on GitHub at microscaling/microscaling.
Metadata, containers, and orchestration
But we quickly hit a problem with microscaling. How do we know whether a particular container image is for a vital service or something noncritical like a batch service? Also the orchestrators need to know the CPU and memory limits that should apply for a container. The developer of the container should have a good idea of the resources it will require. But how can this information be passed to the orchestrator which may be managed by a separate team?
We need some way to add metadata to container images to tell us stuff like that. Fortunately, Dockerfiles have an official way to do this. While building our microscaling engine, we realized that metadata and labels are the key to scaling and the effective use of orchestrators.
In Docker v1.6, Red Hat contributed a mechanism for adding metadata to Docker container images: labels. This was a standard way to add notes or licensing conditions, for example, to an image. In fact, the only problem with labels is they aren’t used more (less than 20 percent of the time).
Schema and formatting
Labels are freetext key/value pairs. That’s flexible but potentially untidy, unsafe, and inconsistent, particularly because:
You can label an image or container multiple times.
Containers inherit labels for their images.
New keys overwrite old keys with the same name.
Fortunately, Docker has defined some formatting recommendations for labels:
Namespace your keynames with the reverse DNS notation of your domain (e.g.,
com.mydomain.mykey
).Don’t use
com.docker.x
,io.docker.x
, ororg.dockerproject.x
as the namespace in your keynames. Those are reserved.Use lowercase alphanumerics
.
and-
(as in[a-z0-9-.]
) in keynames and start and end with an alphanumeric.Don’t use consecutive dots and dashes in keynames.
Don’t add labels one at a time with individual label commands. Each label adds a new layer to the image, so that’s inefficient. Add multiple labels in one call where you can:
LABEL vendor=ACME\ Incorporated \ com.example.is-beta= \ com.example.is-production="" \ com.example.version="0.0.1-beta" \ com.example.release-date="2015-02-12"
These guidelines are not enforced, but tools will undoubtedly to come along to do that (e.g., github.com/garethr/docker-label-inspector).
Labels to Use
There’s been a lot of debate about default and orchestrator-specific labels (see the References section at the end of this post). To get started, we decided to add a minimal initial set of labels to our own images. These labels should be useful for most public Docker images. They don’t yet include the microscaling metadata.
"Labels": { "org.label-schema.build-date": "2016-06-22T08:39:00Z", "org.label-schema.docker.dockerfile": "/Dockerfile", "org.label-schema.license": "Apache-2.0", "org.label-schema.url": "https://microscaling.com", "org.label-schema.vcs-ref": "995bb0a", "org.label-schema.vcs-type": "git", "org.label-schema.vcs-url": "https://github.com/microscaling/microscaling.git", "org.label-schema.vendor": "Microscaling Systems" }
Label-Schema.org
We also helped set up and joined the label-schema.org community to help define a default namespace for standard labels. We wanted to get community agreement on the correct, most useful labels to add to any container by default.
Unfortunately, you can’t currently use dynamic labels in a Dockerfile. You could just hardcode label values, but they can easily get out of date, and that’s a bit yuck. Docker recommends that you use the ARG command to pass these dynamic labels into the Dockerfile.
docker build --build-arg BUILD_DATE=`date -u +"%Y-%m-%dT%H:%M:%SZ"` \ --build-arg VCS_REF=`git rev-parse --short HEAD` .
In our case, we’ve built a makefile to automate passing dynamic labels into our Dockerfile, which is also in our GitHub repo. Feel free to take a look.
MicroBadger
To encourage standard label use, we also created a website (microbadger.com) to display the metadata of public images on DockerHub. If you're a maintainer of a public image, the website lets you add badges to your DockerHub page and your GitHub Readme. On DockerHub, you can show your users the exact git commit that was used to build the image. This is made possible by the labels you add to your image.
Conclusion
The combination of containers and orchestration is a powerful one -- it's why Docker has just bundled their orchestrator, Swarm. With container orchestration, we can use infrastructure more effectively and move from the 10- to 15-percent resource utilization we’re currently achieving to something more like 50 percent. That’s vital if data centers are not going to become the new polluters and inefficient energy users of the 21st century.
The first step in using orchestrators and post-orchestration tooling effectively will be container metadata. Think about:
Adding basic labels to your images.
Contributing to the label-schema.org community and suggesting standard labels to add.
Finally, think about the energy use of your applications. Are they bloated or in VMs bigger than they need? Your planet needs you!