Episode 86: DevOps World 2020 Award Winners - Part 2
Additional winners of the annual DevOps World 2020 awards are highlighted in this week’s DevOps Radio segment. Listen to learn more about their DevOps journeys'.
Brian Dawson: Hello. This is Brian Dawson with another episode of DevOps Radio. This discussion today is a part of a special series we have where we're interviewing the CloudBees 2020 DevOps innovation award winners. Today I have with me Chafin Bryant, a senior software engineer at Capital One. Capital One was recognized with a 2020 DevOps innovation award in the category of DevOps scalability achievement.
This award is presented to honor organizations that have demonstrated the most outstanding and highly scalable continuous delivery in DevOps implementation across a team or an entire organization. Chafin's Enterprise Jenkins Platform team at Capital One provides a platform which is used by over 6,500 users to execute over 50,000 builds per day and is managing over 100 controllers who are formally known as masters. So I'm happy to Chafin. Chafin, hello.
Chafin Bryant: Hey, Brian. How's it going?
Brian Dawson: It's going well. Welcome. Welcome. Congratulations on the award. Hopefully your team finds it an honor. There were a number of submissions and the judges went through a tough process of pairing down tens and tens, if not hundreds, of submissions down to a few award winners of which Capital One and your team were recipients. So congratulations.
Chafin Bryant: Thank you.
Brian Dawson: To kick this off, can you first tell our listeners a bit about your role at Capital One, your background, and what your team does at Capital One?
Chafin Bryant: Absolutely. So like you said, my name's Chafin Bryant. I'm a senior software engineering manager at Capital One. A little bit about my background. So I've been with Capital One for five and a half years. I've worked on a bunch of different technology in my background for Capital One. I've got experience with a bunch of different aspects of the tech industry, consumer technology, enterprise technology, web development, and that kind of stuff.
In my five and a half years at Capital One I've been focused on a couple of different areas. First being production support and availability, making sure that in an always on capacity, our applications are performing and available. Then spent the last couple of years focused on automation within that space, automating everything from DevOps related pipelines, automated incident remediation, those kinds of activities. For the past of about two years I have been leading the Enterprise Jenkins Platform team. So like you mentioned, within Capital One we have a delivery experience organization and that organization provides some centralized DevOps tooling and capabilities for the broader enterprise.
So as the owner of one of those platforms, the Enterprise Jenkins Platform, what I and my team does is provide a highly available, resilient built platform for the enterprise to do CICD bill test deploy automation across our wide swap of developers. So my team is focused on the entire staff from a CICD platform perspective, everything at the underlying infrastructure to the stuff we have built around to the Cloud based Jenkins Platform to the layer on top from a pipeline functionality and plug-ins and libraries, those kinds of things. Sorry, Brian. I lost – there you go.
Brian Dawson: Excellent. Excellent. Thank you for that detail. So can you tell me a bit about the project or actually the effort, because I don't think it was a large project that encompasses the platform, but the effort that was recognized in the awards program. And I'm particularly interested in learning how far or how large you have scaled it and why scale is important as you talk about the project because everybody talks about at scale. We've defined an award around scalability, but why does scale matter?
Chafin Bryant: Yeah. It's a great question. So let me try to break that down. So as far as the project or the overall initiative that is the Enterprise Jenkins Platform at Capital One that's being represented here. So there is – there have been several efforts to identify what are the most optimal approaches that we can take from a tooling perspective to make sure people are being productive.
As you know, there's a lot of discussion in industry, but how do we have software developers focused on delivering software and not on managing the different components of the tool chain that take them away from what they're focused on actual delivering value. So over the past several years we've been focused on initiative to, as much as possible, centralized those functionalities and have people who are there day to day, day in and day out. Skill set is the managing and providing and supporting and enhancing DevOps tooling for the rest of our enterprise to use and take that task off of those other folk's plate so they can focus on software and delivery, right?
And so through that effort the scale has grown from other teams in the past managing their own DevOps tooling for their teams or their portion of the organization's use case to a centralized platform that we call a common capability. So it's something that as people are setting off on whatever their projects journey and goals are is like, hey, this is the tool you need to use to make sure you're aligning with those consistent things that we know we're gonna get reusability, consistency, all of those things out of.
So as far as that – some specifics of that. As we've gone from saying, "Hey, we have disparate tooling," or different people standing up maybe similar tooling in different places of the organization to centralizing that, we have taken, like you said, over 6500 users worth of scenarios and needs and those kinds of things that are incapsulated in about over half a million different Jenkins jobs, different use cases, different pipelines and provided a centralized platform that says, "Hey, the platform is there. It's gonna be highly available. It's gonna meet your needs. We're gonna take care of the maintenance and the functionality and the enterprise level governance and other requirements that we have as an enterprise to make sure how we're delivering."
The quality bar is there, but also take away the need for them to manage that. So like you mentioned, just to talk a little bit more about the scale. The platform is implemented in a way that seeks to provide a cell boundary or a blast radius segmentation so that the level of volume of the activities that our users have, as well as the ability to segment and make sure different environments are separated and stuff like that, maintains good separation of duty, separation of environment, and blast radius segmentation.
So the platform is at this point 100 plus controllers and counting. Separate Jenkins stack that serve a sloth of those users we mentioned. Then each of – the other thing from a scale perspective that's pretty interesting, not only are we writing a CI tool for people to run a pipeline, but one of the other big value adds and complexity of the scale is the fact that each of those different stacks provides different functionality from a network connectivity and Cloud IAM privilege perspective so that the users of the platform can interact with our Cloud infrastructure and interact over various ports and protocols going throughout what they're doing and not have to manage that on their own, but also in a way that keeps it segmented. So that's a little bit of context around the scale.
Brian Dawson: And while there's a lot I would like to dig into there, but I'll keep myself on rails here and ask what I found particularly interesting was use number over half a million unique pipelines. Now is it true that they're unique or do you leverage templating or some other methodology to categorize those half million unique pipelines into a smaller subset of pipelines?
Chafin Bryant: That's a good question. So similar to the journey that we've been on with our centralized tooling we also wanna provide centralized capabilities and I think there are definitely significant benefits to the reusability and consistency to come along with having some, what I'll call pipeline framework, that says, "Hey, here's how we do this." So I wouldn't say it's like, "Hey, here's one job. Everybody at Jenkins _____ gonna look like this." But for whatever those known use cases are I'm gonna deploy a job application in the container or whatever it might be.
For each of these use cases our direction is to have some reusable enterprise wide components that people can leverage so that – one of my favorite quotes, Edward Deming said, "Uncontrolled variability is the enemy of quality." And so the more you can make consistent or reduce the number of those different patterns to shared ways of doing things allows you to improve quality and consistency. So I know that's a roundabout answer to your question, are they all different.
There's definitely a lot of different use cases there, but our goal is we're possible to inner source those things and make reusable components so that rather than reinventing the wheel or redesigning a pipeline, people can reuse things to help them get up to speed in delivering value that much faster.
Brian Dawson: Phenomenal. That's awesome. We talked about the unique pipelines, this codification of different use cases or processes across the organization moving down to closer to the tech stack when we talk about your Enterprise Jenkins Platform and the level of scale you've achieved. If you're going across business units it must mean you're dealing with a wide variety or adversity in tools that the different teams use. Is there a particular way you've approached intergrading some of the bespoke tools that teams have or getting teams purpose fit tools plugged in?
Chafin Bryant: That's a good question. I'd say that in general we try to complement the centralized bill tooling that I'm mentioning with the Enterprise Jenkins Platform with other centralized capabilities so that people have the same hopefully level of ease with adoption and not having to go, "Hey, I need this. Let me implement it from scratch. Let me set it up. Let me implement it in a way that's gonna have high availability and follow these different enterprise architecture patterns," stuff like that.
We have a wide set of enterprise tools in addition to the Jenkins platform that in general provide those other core functionalities to our developer community. And beyond that, for the end number of other things like you're mentioning that people seek to integrate with and stuff like that, we tried to take an approach that primarily helps us identify the patterns that have broad reusability and implement those things as in a pattern that allows all the users of platforms to benefit from them.
Then from there in a scenario by scenario basis, implement things where necessary in a one-off way. But in general, I will tell you at that scale you really wanna make sure that you try to keep things simple. Because I often, as I'm talking through things with people, will draw a vertical access of scale and a horizontal access of complexity. If you have really high scale of something simple it's like I maybe have – it's a lot more complex to have 10,000 servers than one server, but if those 10,000 servers are in the industry term cattle, not pets.
They're implemented in a way that that entire implementation is deployed and can figure this code. There's not a lot of complexity there because it works how it works. But if you have 10,000 plus server implementations, you get the worse of both worlds there because it's super large scale and super complex. So definitely one of the learnings there that I would say is where ever possible make things reusable and consistent across. I sometimes joke with people when they bring up ideas.
I'm like, "Hey, I'm gonna have to get a snow globe to put these snowflakes in." No flake one off things. People have heard me say that many times. I try to keep snowflakes to a minimum because consistency is key for being able to manage on a U.S. scale, otherwise it just becomes overwhelming complex.
Brian Dawson: Phenomenal. Well, actually you mentioned a lesson learned. So moving on or to continuing on on lessons, can you share with us some key lessons that you learned and in particular, if you would, I'd like to understand what was the, I think, most impactful thing that you did to achieve this level of success and hopefully as you talk about lessons you can share some of the positive outcomes that you've seen as a result of this effort.
Chafin Bryant: For sure. So I would say there's a few here and if anybody – I'll plug my other CloudBees event. I did an event for the CloudBees delivering the finance industry a couple of months ago, but I had some tips for success in there. I think there's a couple there that I would call off. The first one is architect for stealing growth and I really like the quote from The Seven Habits of Highly Successful People, which is "Begin with the end in mind." And my version on that is there's nothing more permanent than a temporary solution.
So rather than like, "Hey, let me – oh, we'll just do it this way for a little bit," you never go back and reach back for it. So I would – I recommend especially if you're starting out in Greenfield with something new and you have the ability to design it well, you can't solve every problem up front, but you can think about how is this gonna scale. Ask those questions up front because I also really like the idea of a premortem.
So you hear people talking about doing a blameless post mortem, which is like, "Hey, what went wrong?" The idea of a premortem is sit down now and say, "Hey, it's six months from now. This thing failed. What happened?" And challenge your brain to think through a little bit differently or in a less optimistic way of, "Hey, what are the [break in audio] things that I need to account for?" So that's one. I like the idea of do a premortem and think about the end, where are you going, and make decisions that are gonna support that well.
To echo the one I just mentioned. I said aim for consistency wherever possible, but I like the idea, I think it's attributed to the Navy Seals, which is slow is smooth and smooth is fast. So it's counterintuitive, but if you start – if you do something consistently you can then increase the cadence at which you're doing something consistently. So I really like that idea of whatever it is you're trying to accomplish you don't have to make it super fast. You just have to, where possible, automate it so that it's something you can do consistently because automation is a key driver there.
Then once you've got it automated you can make that process faster, thus accomplishing that consistency – slow down and go fast or slow is smooth, smooth is fast. I think that piece is key. Make sure you know – you focus on how to do it well and then you can do it faster and faster.
Brian Dawson: Great advice. Well, we're coming up on time and before I ask for your final thoughts, I just wanted to take a moment to say thank you for submitting for this award. Congratulations on the award. Again, the awards are given to very few people. Hopefully your team is proud. Then thank you for taking time to share your lessons with our listeners here today. Before we wrap up, do you have any final thoughts that you'd like to share with our DevOps Radio listeners?
Chafin Bryant: I'm glad you asked that, Brian, and I have the opportunity to say my job is the easy part. All that I do is try to get the road blocks out of my folk's way. So to every member of the Enterprise Jenkins Platform team, I would give them all the credit there for everything you're talking about and that we've discussed here because they're the ones who make it happen every day. I'm super excited to continue down that journey.
This is only the beginning from a scale perspective with some of the direction we're going. So hopefully we can chat again soon and on the horizon is a scale that makes what we're talking about now seem small. So I appreciate the opportunity to chat. Like I said, massive shout out to my team. They're the ones who deserve the credit for this awesome achievement and I appreciate you taking the time to chat.
Brian Dawson: Yeah. Thank you. We absolutely will. I'm sure we're gonna be talking more later. That statement is intriguing. It'll make – I'll paraphrase, it'll make the skill of the day look like little league. I wanna know what you have planned. I will second it to the CloudBees – to the Enterprise Jenkins Platform team at Capital One. Thank you and congratulations as well. Chaf – Chafin – excuse me. Chafin, thank you for your time. I look forward to speaking in the future.
Chafin Bryant: Thanks a bunch. Have a great one.
Brian Dawson: All right. Bye.
Second interview begins
Brian Dawson: Hello, this is Brian Dawson with another episode of DevOps Radio. Today, as a part of our series where we've interviewed DevOps 2020 award winners, we have with us today Riad Ghafir, the global head of production factories at BNP Paribas CIB. Riad is representing the team that is the winner of the DevOps Automation Excellence Award. That award is presented to the customer that exemplifies the most outstanding automated DevOps processes across a team or an organization. Hello, Riad, how are you doing today?
Riad Ghafir: Hi Brian. I'm good and very happy to be with you today and to speak about this award.
Brian Dawson: Likewise. Thank you for joining us and congratulations on winning this award.
Riad Ghafir: Thank you.
Brian Dawson: I'd like to take a few minutes to learn more about what brought you to the point that you've received this honor or achieved this honor of receiving this award, but before we dig into the project, can you tell our listeners a bit about your role at BNP and what your background is?
Riad Ghafir: Yeah. So my role at BNP Paribas, my current role, which I have since around one year and a half is global head of production factories. This includes many topics, which have a common theme around automation. So DevOps is a big part of my department. Then, I also manage testing environment management services, data protection services, and as well as intelligent automation.
The team is around 80 people distributed in many locations, so Paris, London, India, and Canada, providing support and project services around these activities. My background, so I spent the last 20 years in BNP Paribas, quite a lot of time where I moved from role to role, mostly leadership roles within IT, application production support, infrastructure, DevOps, identity access management and other security topics and finally in the last few years, I was more focusing on digital services and DevOps is one of them.
Brian Dawson: Now, I'm curious, was DevOps an initiative or effort that existed before you took on this role or is it an initiative that came about under your guidance?
Riad Ghafir: It was existing already and it was starting, it was the beginning of the DevOps journey and adventure and I took – I was involved in the beginning, not as a driver of the initiative, but just after a few months, I got this new role and I was already at the leadership of this initiative. The DevOps, I would say the DevOps activities, even though they were not called DevOps existing in BNP Paribas and CIB for a while in terms of software delivery, software engineering and automation, deployment automation and all the topics around software delivery; however, what has been done in the last three years is to put everything under one umbrella and call it DevOps to deal with the software delivery management.
Brian Dawson: Awesome. So, my understanding is that, as you said, you're the head of a global team and in the area of DevOps, what you guys offer is a universal CICB platform that supports roughly 800 project teams and you've had the challenge of supporting a wide range of products or projects or requirements, excuse me. Is that correct?
Riad Ghafir: Yep.
Brian Dawson: So, maybe you could tell me and our listeners a bit about how you achieved automation excellence, frankly, with this project. Can you tell us how this project came about, hat some of the keys to your success were and give us a little bit about the overall journey to where you guys are today?
Riad Ghafir: Yeah, so the project started around 2017, late 2016 to early 2017 as a program which was part of the digital transformation for CIB and for the overall bank and DevOps was really a key stream in this digital transformation. And the aim was to provide a platform and services in order to enable teams, IT teams, therefore ops, to adopt this methodology and also onboard the applications and the projects into the journey. So it started with a design and work and collaboration with other IT departments like architecture, infrastructure and so on and we came up with a design for a platform, which is relying on many products and also available in the marketplace in an internal cloud, which is called IB2 and the marketplace allows in fact the consumption of these products in a self-service mode, removing all the middle men or teams on such a request to install things manually. So the aim of this platform was to be self-service and also to be an enabler to secure the platform. Because we are in a bank, a highly regulated institution, so we needed to make sure we go fast, but also we go fast in a secure mode.
So, we started the journey there. So the first phase, as I said, was design, then implementation of the platform and the choice of the products composing this platform. So of course, we use many products, orchestration management, artifact management, etcetera. I would say on the orchestration side or the core of the CICD platform, so we have chosen CloudBees Jenkins as opposition to other alternatives and other tools because it provided a product which was quite well known and still very well known, Jenkins, it started on an open source a while ago. So in terms of skills and knowledge, we have the knowledge and it was easy to get knowledge from the market, but also leverage a vendor and a company to support as in this journey because we knew it would be challenging. It's a big environment. We spoke about 800 projects or 800 applications, but it's not the end. We are still onboarding and the journey still continues. And also, different I would say culture, different technology stacks, different distribution of teams, we have teams which are centrally located. We have teams which are distributed amongst a lot of locations. We have teams organized in different ways, waterfall versus agile, and so on, and also the applications don't have the same criticality or the same constraints.
We have front office applications like electronic trading applications, which care about latency and speed in a secure way still and there are also ecommerce applications that have other constraints and back office applications, which have also other constraints in terms of criticality on payments, etcetera. So there is really a big scope to cover and we needed really a platform which scales in an industrial way and also highly available and performing very well.And we achieved quite good results when we started deploying in production. The platform is in production since around two years now, two years, between two years, two years and a half.
And we got very good results with some projects where for example, the deployment time was divided by ten, sometimes 15 times. And also we had really a good buy-in from some teams where it was also an opportunity to get dev and ops together. It was the case for some projects, not for all, and our initiative made them work together and these things are continuing now.
Brian Dawson: Riad, that's fantastic. Thank you for sharing that detail and congratulations on those success outcomes that you've seen.
I'd like to ask, is there one lesson that you learned during the course of getting to where you are today with this effort that you can share with our audience?
Riad Ghafir: Yeah, so if there is one lesson, there are many, if there's one lesson that I would share is in fact, running a project like that, it's all about humans, I would say, and culture because the adoption of a platform is not about adopting a technology. It's about adopting a way to work and also getting the buy in into a project or a program like this. It was not always of use. It will not be true to say it was all smooth and everything was good and everyone bought into it very quickly.
There was a lot of fantastic teams who really were enjoying doing – being onboarded into the platform from day one. There were others which were not seeing really the benefit, so we had to convince to demonstrate, to convince that what it will bring to them and especially when we talk about speed, it's not always we can divide deployment by ten or by fifteen. Sometimes it's not only the reason to use these toolsets and the DevOps in general. For me, the focus shifted from speed only to more now it's more about security, control quality and speed.
And automation I would say program or initiative can bring now, it's also securing either a transaction or a deployment because doing things manually of course is error prone and difficult to audit, difficult to trace, etcetera. Now when you have a tool chain, which is fully integrated and fully automated, it makes things very easy to run and also easy to trace, to audit, etcetera. And in a bank, it's very important.
Brian Dawson: So, Riad, it's been quite a journey, you know, a multiyear journey. You've arrived at this place where you and your team have been recognized with this award. I think, as you said earlier, I'm sure you're not done yet, so I'm curious, what's next for DevOps at BNP?
Riad Ghafir: No, DevOps at BNP, I would just transition what I said already. I talk about security, so we are currently still onboarding a project, of course, but also I would say not refreshing, but enhancing our platform and adding some other services. And there would be a focus in the next few months in the dev sec ops in order to add the security I would say dimension, more importantly into the platform. We already of course use a security competence because we need to respect the segregation of duties between dev and ops without jeopardizing the automation. So we use private access management tools, single sign-on, etcetera. However, now our focus is more adding security tools for – I would say security testing tools, SAST tools and SCA tools, etcetera, in order to make sure when a team or a project consume this platform or this service, they would have an end-to-end process including a way to analyze their code, to clean their code and to make sure what is delivered in production is clean and good and in a very automized way.
Brian Dawson: Awesome. Well, hopefully as you guys get further along your dev sec ops journey, you'll check back in with us and update us on your project, maybe even submit for the DevOps Innovation Awards next year. Riad, congratulations to you, congratulations to your team, congratulations to your organization on this award.
I really appreciate your time. I really appreciate you taking a moment to share your journey with our listeners and I hope to follow up with you again in the future as you guys progress.
Riad Ghafir: Yeah, thank you very much and thank you for this podcast, for this award, and I would take the opportunity to thank my team because it was a huge effort and a team effort on all occasions and as I already said internally, it's having this award in 2020, it's a significant meaning. It's a very special meaning for me because it was in a context, which is really challenging, to say the least. So my team which is distributed in five locations, is working a hundred percent remotely since March with some hectic episodes, I would say, at the beginning of this situation. It's very important to have this team recognized for their efforts, and also, thanking CIB and BNP Paribas, CIB for the opportunity to work on this digital transformation and for the opportunity to work on this very exciting project in order to make sure we deliver something useful for the business, for the bank.
Brian Dawson: Phenomenal. Again, congrats to you. Congrats to your team. Congrats to your organization and the others that you thanked. Riad, I hope you have a great day.
Riad Ghafir: Thank you.