Capital One's Analytic Garage on Docker
Codeship is at DockerCon 2015! This week, we’ll be providing summaries on our blog of some of the talks we attend at this two-day conference in San Francisco. If you are interested in Docker support from Codeship, click here.
Yesterday morning, Santosh Bardwaj spoke at DockerCon about how Capital One is utilizing the stability and flexibility of Docker to adapt to the rapid change of pace in data analysis.
As a senior director of engineering at Capital One, Bardwaj explained to his audience that the American bank wants to build its technology foundation to ensure leadership in analytics. And leveraging open source material is a big part of that plan.
[caption id="attachment_1950" align="aligncenter" width="6016"]
Santosh Bardwaj at DockerCon 2015[/caption]
The Goal
Bardwaj said that Capital One wants to enable access to the best data analysis tools for all its associates, specifically to allow for:
self-evaluation
local testing
on-demand evaluation environment
workload isolation
tool governance
To reach that goal, the platform and engineering team chose to use Docker to build an Analytic Garage. They had to engineer an effective architecture to evaluate and integrate a large variety and volume of tools and software packages. They needed something that would allow for continuous testing of tools to better meet the analytics needs of thousands of users.
The Solution
Bardwaj stated that the Analytic Garage was built to create a separate environment for users to quickly prototype new tools. The Garage went through a few different versions before landing on this stack: Mesos Marathon, Docker, cgroups, RHEL 6.x, and GlusterFS. As a result, the Analytic Garage could handle:
improved stability
improved resource utilization
more users
more tools
self-evaluation
isolated workload
The Garage also integrated with the rest of Capital One’s Big Data ecosystem to enable agile progression of insights to deployment.
The Challenges
Now, how to get employees to use the Analytic Garage? Bardwaj said that to encourage adoption across the analyst community, the team developed a self-service UI, which offered:
a web portal to instantiate containers and analytic services
Kerberos integration with Hadoop and Hive
integrated monitoring and metrics
lifecycle management (container expiration)
highly available cluster with Mesos Marathon
shared storage using GlusterFS
The team also needed to minimize the complexity of adoption. They created a virtual private server by integrating multiple analytic services, apps, and tools into one Docker image. This approach offered a few advantages:
familiar data centric sandbox image
maximized portability and performance
allowed for the use of hybrid tools
reproducibility and auditability with a versioned environment
volume mounted tool directory to screen new tools before they were integrated into the sandbox image
ability to instantiate containers in seconds, despite the size of the image
Of course, the VPS approach had a few challenges in store as well:
trial-and-error coordination of the initialization order of the services
GlusterFS, Docker, Mesos-master, Mesos-slave, Marathon
open source Gluster resilience is fragile
Docker isn’t fully supported or stable on 2.x Linux kernels
cgroups bug
random reboots
the device mapper is much too complex for use
The Results
Bardwaj stated that, at the end of the day, the Analytic Garage on Docker has significantly reduced the time it takes his team to evaluate and onboard new tools and solutions. It’s also helped accelerate the evolution of his team’s data technology strategy.
Specifically, he said, the Garage enables them to build, test, and iterate complete app prototypes using a “LEGO block” approach; it allows different groups to easily select and use the tools that they prefer.
Within Docker at least, its performance is comparable to bare metal, enabling analysts to run complex models. Bardwaj said they were cautious about this due to the fact that VM performance wasn’t acceptable. However, they’ve been happy to see Docker perform so well. They’re currently testing different approaches to persist DBS on Docker as an enhancement to their analytic ecosystem.
[youtube https://www.youtube.com/watch?v=ogDa2-A1y9U&w=560&h=315\]
Slides
Stay up to date
We'll never share your email address and you can opt out at any time, we promise.