Alan Ranciato - Feel the Need for DevOps Speed
In this episode of DevOps Radio, we're at Jenkins World 2017 with Alan Ranciato of Express Scripts. He'll discuss his role at Express Scripts to help in standardization of IT in a large organization, and the different technologies that help foster that.
Sacha Labourey: Hello, it's DevOps Radio even though today it's not quite radio: it's radio and YouTube. And we are at Jenkins World with Alan Ranciato. Hello Alan.
Alan Ranciato: Yes. How you doing Sacha?
Sacha: Very good. So we are at Jenkins World so it's pretty noisy around but I think the audio systems must be pretty good. So Alan you're working for Express Scripts. Can you tell us more about what you're doing and a few words about Express Scripts.
Alan: Okay. Express Scripts itself is the largest PBM, which is a Pharmacy Benefit Manager in the US. We are number 22 on the Fortune 500 – really the largest company nobody's ever heard of. They are the company that you will see on your prescription card when you go into the pharmacy. And they manage the plans and they manage that. And they also have a pretty large mail order pharmacy. So anything you order – any like diabetes meds and things like that that come and ship to your front door: we do that.
Sacha: Oh wow. Okay. How many employees?
Alan: I think about 30,000 nationwide, about 5,000 in technology.
Sacha: Wow, so not quite your average startup.
Alan: No, they're pretty large. A pretty large enterprise, grown a lot over the years. We are number one – CVS is our main competitor right now – and the other company that people have heard of.
Sacha: Very nice. Okay. And so you've been in technology for a long time?
Alan: A while.
Sacha: You still look young – that's not what I meant – but you've been around for a while and you've worked for startups, larger organizations – can you share with people a bit your background maybe?
Alan: Sure. I've been in technology since before Y2K and I sat around and watched the panic. Started out on the networking side of larger companies, did a lot of consulting working in the CRM space, did a couple things on my own, worked for some larger companies. Most recently I was with American Express for about five years and then about a year ago I moved over to Express Scripts. And I was brought onto Express Scripts to really drive the strategy around DevOps and automation as part of their agile transformation.
Sacha: Okay. So tell us more about how Express Scripts was before you joined and when they called you essentially what was the objective? What was the stated goal?
Alan: So they are extremely large and they've grown over the years via mergers and acquisitions. So they've got a lot of little pockets of technologies that have been acquired through different companies anywhere from mainframe to older web sphere stuff to dot.net to some pockets of EB6 to –
Sacha: That's a lot of fun.
Alan: – now we're springing React on an internal private cloud but just merging it all together nobody had a clue. So just from the standpoint of how we do things there were no standards because we did have these little pockets of technology, you know, we had Jenkins running in PCF and that was our newer stuff, which was awesome, but then you'd find old instances of Hudson, you'd find people e-mailing binaries back and forth, you'd find just little things like that that we had to really not even centralize but standardize from the standpoint of being able to move faster and knowing what we're doing.
Sacha: So the stated objective was really to move faster, to be more competitive to software. Who initiated that decision that faster would be a good thing?
Alan: Our CIO. And so it did come top-down. Our CIO is Neal Sample, he came to us from American Express – I actually worked with Neal over there – eBay, Yahoo – started out over here – and so we had a relationship and he basically said, "Come on over, let's do what we did at AMEX," because we did a great job over there and we needed to do it over here and we needed to come up with a strategy to actually bring it all together because there are a lot of different verticals and a lot of different silos in the organization and we really needed to know from an organization wide if we're going to go from waterfall to Agile, how we're going to do that and how we're going to support it with the technology, because you can't really move fast unless you can move better.
Sacha: So how do you approach this? Because I can picture myself going in a company where you have 5,000 developers and you have pretty much all stacks you can imagine, right, from I guess Visual Basic to AIX to mainframes to whatever and you need to make sense of this and start organizing things. So where do you start?
Alan: Low-hanging fruit. A big part of it was let's figure out where we can get quick wins and actually show the agility, show how quickly can we get to CI, how quickly can we get to CD? And so how do you do that? It's a little bit more difficult on the legacy platforms that no one's ever been able to deploy code to without actually opening up a Remedy ticket and having stuff done manually. So, we started out with our cloud platform and took the teams that were in there – how can we actually get to continuous integration, testing, deployment, actually move on and then automate pieces of the release process and bring that all together? Once we can see the benefits of that then we can start pushing those benefits to other parts of the organization. And we really tried to streamline the onboard. You know we used Jenkins a lot, built out global pipeline libraries to enable teams to really, with three lines of code, to be able to get onboard and to help us actually build out the structures for what they need to do – how their code is built, how their platforms are built – and then we would work with them on the actual deployments and work with them on integrating into the rest of the systems and that, so we actually had visibility into it. We’re still along the journey: we're not there yet, but it's made a lot of progress.
Sacha: When you're talking about making quick wins, so it feels when you're saying this that there was a need to essentially socialize success and show that it's possible or – it feels like there's something else than just obviously improving metrics but there was a cultural aspect you were trying to initiate here.
Alan: Yeah, I'm a salesperson.
Sacha: All right.
Alan: I'm a reluctant salesperson. But it's my job. So no, I do spend a lot of time working with teams. I do spend a lot of dog and pony show time actually getting teams onboard, but I also do a lot of hands-on to actually show them how easy it is to do it, which is the fun part of my job.
Sacha: Yeah, so you have huge diversity within the company – you have lots of acquisitions that you play so probably people with very different backgrounds, very different cultures. How much resistance did you face from some of those teams?
Alan: The funny thing is it wasn't that bad.
Alan: I actually had probably more resistance when I worked at American Express which we were a vertical in American Express – that was enterprise growth. So we were a little bit ahead of the larger organization – ahead, from the standpoint of technology was a little bit more up to date – we were a startup that got acquired. So I probably had more resistance to change in that type of culture because we did have people who thought they were doing it right to start. At Express Scripts there was actually a lot of pain and a lot of pain both in process as well as just not knowing it could be better. Because you've got people that have been there for 20, 30 years that just hadn't really seen it done and then you show them how it can be done and they're like, "Wow, that's actually really cool." So the resistance really wasn't that bad. It was – there's always going to be some, but for the most part it's been – people have been accommodating, they really have.
Sacha: All right. And do you feel like all applications, all environments were able to benefit from more CI/CD or do you feel like there's still some area of the business – maybe mainframe or I don't know what other areas where it's harder to get there?
Alan: Yes. Mainframe is definitely the elephant in the room, so it is the difficult part. Sadly – I mean with some of the tools we have, the mainframe stuff isn't actually – the CI stuff isn't really the long haul: it's more of the process stuff, which is interesting. And we've actually decided to go down the path of using Jenkins for that and building out pipeline workloads from Jenkins from an intake perspective. And just input forms that we can then send to our XL release platform that actually get them in the pipeline and get them queued up for release is much, much faster, and take a lot of the data out of having to click open Excel spreadsheets and having to go through like the process of ServiceNow and getting and gathering and then manually doing that – putting all of that into a format we can actually use and then as we move forward start automating that more and more.
Sacha: I see. So how does your infrastructure look like today? What your solution looks like from a DevOps standpoint?
Alan: Jenkins wise?
Sacha: Yeah, let's start there.
Alan: Okay. We are in the process of moving to the Enterprise platform – or the Jenkins –
Sacha: CloudBees Jenkins Enterprise Platform?
Alan: CJP right now.
Alan: We've been in the process for about six months moving from – because we were running Jenkins 1.6x on Pivotal and being that 2.0 is not supported we’ve had to move over. So we're in the process of going there. Extremely intrigued about the CJE Platform and we're really looking at that because we are so diverse and we are so spread out with just a lot of different teams and a lot of different things we're doing – I can see the benefit from a stability standpoint and a scalability standpoint to going on that route right now.
Sacha: Are you using containers?
Alan: We are. We are using containers more for build agents and dependencies, not so much in like a Swarm or Kubernetes concept but just on our nodes to actually build out what we need so we can actually do these and provide different versions, different – because the stack is so diverse – I've had people asking for a JDK6 and everybody else is running JDK8 – and before I tell them no, we have to figure out if we want to do that. I don't want to start go install independencies and having to manage that across our nodes. So using containers for that sort of stuff, using containers for Ansible to be able to actually have a deployment platform that we can containerize and not have to worry about different things and being able to split – spin-up and isolate really all of our builds and our infrastructure around that.
Sacha: So how do you use Ansible with Docker?
Alan: We actually run Ansible within Docker and make the connectivity out to our target hosts.
Sacha: And you're using it to actually build the image from within? Is that what you're doing?
Alan: No, no, no, no, no. We are not on that type of infrastructure. This is more for actually automating the deployments. So it's deploying applications, it's not actually building out infrastructure at that point. We are in the process of doing it – and we do have Ansible not necessarily within Docker but from Jenkins building out things like our JIRA platform and actually doing a full destroy and buildup on those things via PlayBook. But Ansible was our solution to actually touch the rest of the enterprise with our deployments. And it was kind of a "How do we standardize how we actually touch and get applications out to systems that we were never able to touch before? How do we secure that and how do we manage it so we just don't open up ports across the board and do all of that?” So we went with that. In order to do that we were able to standardize our Ansible roles – actually have global roles out there that we're using Pipeline to pull in as part of their builds, keeping all of their target hosts, inventory within our code repo, which is nice because now we're not asking somebody else for the spreadsheets to determine where this application is actually deployed to. So everything happens through the build process all the way through, which is really, really nice.
Sacha: Yeah. What about the cloud? Are you already using the cloud?
Alan: Being a large healthcare company we have problems with that. So we do have PCF in-house – so we do have on-prem cloud – and we've been going that route and little by little as we start to componentized our platform, moving over to microservices – do more of a PaaS platform – we are doing that. We're going with Spring, React, a lot of node stuff – actually moving it into there.
Sacha: Right. PCF as in Pivotal Cloud Foundry.
Alan: Yes, right.
Sacha: Okay. Just to make sure people…
Sacha: All right. So we had a discussion before where you were expressing how you structured things centrally, versus distributed, and we've talked with a lot of companies that are more centralizing things and offering this as a service. In your case you took a slightly different approach and you really wanted to empower your teams. So can you talk about that and why you've made those choices?
Alan: So I've worked in a lot of environments where developers write code, developers tell QA to test code. Three weeks later the code comes back to developers, it has to go to production anyway because it was three weeks later – or it goes to production and it breaks and then there's an operations team that has to triage it and then a production support team that tries to fix somebody else's code – and that's great when you're working in waterfall and you're on a three- to six-month release cycle.
Alan: When you're working on two week sprints and actually trying to get teams to deploy every day, you can't really have other hands in your code: you have to own it. And you have to own it end-to-end. You have to know what it does, you have to know what happens when you deploy it if something goes wrong. You have to know how to support it, once it's in production. So by pushing that empowerment out to the teams and really just giving them the tools that they need to be successful they actually own it: they know how it works, they know how the deployment goes, they are responsible for putting in the telemetry to actually see what's going wrong with their code, and they're responsible for signing off on it and saying, "All right, I'm accepting responsibility. Let's watch this thing go out. Let's bring it back out. Let's bring the old version." So without actually giving them the ability to do that when you're in an environment that's audited seven days from Sunday you still have to have some sort of framework around that while you give them the control. And that's a lot of what we've done. So we've centralized the framework but really giving them the ownership of doing it.
Sacha: I see. All right. And so have you been able to measure in some fashion the progress that you've achieved essentially through those changes from the time when you joined, to today? Are you able to quantify some progress?
Alan: We have. You know, we've been using several different platforms to actually measure what we're doing, you know, throwing data into Graphite on our builds, throwing into Splunk on some stuff, but more importantly just stopwatch time and watching what it takes from the point of code check-in to the point it gets to production. And we've gone to the point where it would take two weeks to get there – in a good case – like the best case it would take two weeks with something like 30-something hours of stopwatch time to it taking a day of actual 45 minutes from check-in to dev to QA to approvals in production. And to get to that point is really where we see the big win, because it's not data coming out of it, it's really what the results are in the end.
Sacha: I see. So just before I was talking to Stefan from ABN AMRO – a bank in Europe – and he was giving me this interesting metric to quantify – before the move to DevOps – and it was a time to release a Hello World application. So starting from zero you need to deploy your Hello World – how much time does it take you? So how much time do you think it would have taken to Express Scripts before the transition to do that Hello World acid test?
Alan: It took several days for the most part and that included filling out support tickets to get somebody to move your code and whatever the SLAs are.
Sacha: That's not too bad. All right.
Alan: You know, the actual process to get all of – I'm sorry, that was QA for deployment – the actual process to get to production is usually weeks.
Sacha: Okay. They were at six months, initially, so –
Alan: Oh wow. Okay.
Sacha: Yeah, six months for Hello World.
Sacha: So I'd love to drift down a bit into the culture – the cultural aspect of this transition – and maybe if you can share some of the tricks. You said you're a sales guy essentially but what did you see work? What did you see was creating more defensive behavior and that should be avoided? So any tricks you can share?
Alan: I'd say what works best is kind of helping the teams and actually getting their involvement. And that's part of the reason we inter-source our pipeline code, is so that we could get teams actually owning it and being a part of it. And so most of our teams know what they're doing better than I know what they're doing. I know what it should look like from a "These are the steps we need," but what actually happens within that container that is their build, test, deploy piece in their workflow, I don't necessarily know what's best for this application or that application as much as just a high-level picture of how I need it to look.
Alan: So getting in and actually helping I'd say garnered a lot of buy-in with it.
Sacha: I see. And so if you had to redo something that you don't think you managed well what would it be? Did you think there is an obvious mistake you made at some point that you regret now?
Alan: I'd say probably rate of change is tough, especially during the transition. You've got a lot of teams that are trying to do a lot of different things and we tell them to do something one way, but then we have to pivot for one reason or another and then having them have to change to support what we're doing causes a lot of churn.
Sacha: Do you think you went too fast or you were not ready at the time you decided to move forward? What do you think was a mistake?
Alan: I think working in a culture that is not use to agility there are times that things were too fast.
Sacha: I see.
Alan: You know, coming from a culture where iterating is key it's expected.
Alan: But I think working with a lot of teams that aren't doing that and aren't used to that that causes a lot of churn.
Sacha: Yeah, I've heard that the feedback quite a few times – not just in the context of DevOps but there is almost attached to any organization some natural clock that you have to respect and if you go too fast then you're putting that system under pressure and you're not getting results. So, over time, you can increase the drumbeat but if at any point in time you try to go too fast, bad things happen.
Alan: Right. I mean we talked about just doing a whole lift and shift and realized that just would fail miserably. You just can't go about it that way, the teams aren't ready for it, and just the whole having to stop, turn, figure out what's actually being done is just impossible for them to catch up and actually still move forward with their work. So it's been a gradual progression. It's definitely not as fast as I would like it to be but it's probably faster than a lot of people are ready.
Sacha: Yeah. It's a team effort, so, yeah. So you seem to have done an amazing amount of work and progress in a relatively short timeframe. So what are your next challenges you'd like to see solved?
Alan: I would like to get to CD. I really would. I would like to remove all of the human interaction from the process. That's a challenge, especially in a regulated environment, but I would like to see that. And even if it's not across the board and we all know it's a risk-based decision.
Sacha: Pick your battle.
Alan: Yeah. But if it's – you know, we're talking front-end applications, we're talking things at the top of the stack that actually aren't going to impact the way people receive their medication – that's what I would like to see the world go to and that's what I would like to see us do is to be able to actually go full CD, remove all of the stopping points, remove all of the minutia and all of the unnecessary approvals and checkpoints because we have automation in place.
Sacha: And what do you think is going to be the biggest challenge for you through that journey?
Alan: Test automation. Test automation. We are a – have been traditionally a very manual environment. We do – we have ramped up a lot, we do have test automation in some areas but not near enough to be able to be comfortable across the board to be able to do something like that.
Sacha: All right.
Alan: And I guess another part of it is being willing to fail: how do you fail fast and actually recover from it versus having to be perfect every single time?
Sacha: All right. So some closing thoughts. Where do you think DevOps is going? What do you think is the next big thing in DevOps? What's your view on that?
Alan: I think it turns more into, like, strictly cultural change. I think the centralized DevOps role kind of goes away, because I think a lot of the tools are there more, now, to the point where they are almost self-supporting. Where they are plug and play and they do all the things we need them to do, so there’s not necessarily a team that has to really own that and there's support for it, but I think it becomes more enablement and I think teams actually have to own that and the culture moves more and more that way, versus centralizing the DevOps role.
Sacha: All right. Well thanks man. All right. Thanks a lot Alan.
Alan: Awesome. Thanks so much.
Sacha: And enjoy the show.