Speeding Up Development at Lyft
Written by: Codeship
6 min read
Codeship was at DockerCon 2015! This week, we’ll be providing summaries on our blog of some of the talks we attended at this two-day conference in San Francisco. If you are interested in Docker support from Codeship, click here.
Tuesday morning, Matthew Leventi spoke at DockerCon about how Lyft uses Docker to increase developer productivity across its engineering organization.
As an engineer at Lyft, Leventi explained to his audience that Docker enables them to create a standardized development environment, including all necessary service dependencies. It allows developers to be much more efficient, and it reduces onboarding/ramp-up time.
The Goal: Increase Productivity
Engineering at Lyft means fluid teams and a rapidly growing headcount. Everyone, Leventi stated, does devops. It’s an operation that involves 50+ microservices and 25 server deploys per day.
Lyft has the ambitious goal of enabling brand-new developers to ship code to production on Day One of employment. It’s a particularly strong measure of productivity designed to achieve faster feature development.
Leventi pointed out that common hindrances to developer productivity include:
problems with VPN
a build failed, for reasons unknown
a deploy to production that may break the world
Lyft’s goal is to remove those types of blockers and cut down on common productivity complaints like:
“It doesn’t work on my box.”
“I don’t understand how the client go into that state.”
“It worked in development!”
“How do I get service x to talk to service y?”
“How do I test this feature from the client?”
“How do I get started working on a new team?”
In April 2014, Lyft chose to invest in standardizing and improving developer work environments. It was a decided switch from their methods before last year: Each dev had individual installs of many individual services; it was a manual task to stay up to date on changes; and it was expensive to orchestrate, just to name a few bottlenecks.
Now, Leventi said, every developer at Lyft runs the standard environment Devbox. Everyone has the same up-to-date local environment, and Docker containers have all the resources for a live, running environment to test new code.
VMware Fusion runs Docker on MacOS, and Vagrant handles Lyft’s virtualization configuration. Services are started via command line (./service start api), and Devbox has the capability of snapshotting an environment for troubleshooting or analysis or comparison.
Lyft also uses "Onebox," a standard test/integration environment that can be easily deployed to a cloud environment. That means, Leventi emphasized, all of Lyft, in the cloud, running any combination of builds, on a single EC2 instance. It’s constantly up to date, and every QA engineer has their own environment.
An open-source continuous integration service runs all integration tests between and inside services.
Implementation
Leventi described Lyft’s service model in detail:
single fat containers
stateless
fixed static IP address model
single stateful local container
auto detect code changes
Each Docker image is a file system snapshot of config management. Building one consists of:
a git clone of a central ops codebase
a git clone of a service codebase
a SaltStack provisioning run
runit configuration for processes
To run a service image, a Lyft dev follows this workflow:
Reruns salt provision on new SHAs
Starts runit processes
Terminates the container if initial runit checks fail
As a result, devs can easily apply ops modifications; testing PRs are a matter of changing environment variables; and devs don’t need to wait for an image build, as deltas are applied during runs.
Each environment has its own single host: DevBox has a Mac Docker host using VMware Fusion with shared folders; the CI slave has an AWS Ubuntu Docker host for short-lived containers; and OneBox has an AWS Ubuntu Docker host for long-lived environments.
To manage state, all stateful processes run inside the same container. For Lyft, that includes:
Redis
MongoDB
SQS Local
Fake Kineses
Leventi demoed the process for his audience with a small, sample Python web app, showing that code can be modified live and reflected without having to reload a Docker container.
The Results
As Leventi mentioned early on in his talk, a big measure of Lyft’s productivity success is whether or not a brand-new dev can push code to production on Day One of employment. And, he said, a majority do. Feature devs are no longer blocked by devops, and QA client testing is parallelized with separate but identical environments.
Of course, what is productivity without stability? Leventi stated that 99 percent of Lyft’s deploys are successful, and every pull request on every service is integration tested.
There are always lessons to learn as a company pushes the envelope on productivity. Leventi mentioned a few hurdles in particular:
VMWare Fusion can be unstable under load
Frequent image downloads take time, and devs need to plan downloads
Bugs in config management can freeze development
Easy service creation leads to unnecessary services
It’s easy to approach limits on what can run on a single box
Static IP allocation isn’t supported in Docker
Lyft is currently exploring Docker usage for production, Leventi said, with ETL jobs in Docker. They’re also experimenting with containers to reduce auto-scale group spin up/down times and containers for atomic deploys, as well as using Docker for on-time actions.
Considering Lyft’s primary goal is to accelerate productivity, it seems they’ve effectively utilized Docker to achieve a standardized environment ready to create more efficient devs.