Dean Yu - Picture-Perfect CD

Tuesday, 29 November 2016

In this episode of DevOps Radio, we’ll hear from Dean Yu, director of release engineering at Shutterfly. We’ll hear about his background at Yahoo, the mobile strategy at Shutterfly and how to achieve picture-perfect continuous delivery.

Andre Pino: You’re listening to DevOps Radio, the podcast series that dives into what it takes to successfully develop, deliver and deploy software in today’s ever-changing business environment.

Sacha Labourey: This is DevOps Radio. Sacha Labourey, CEO at CloudBees. So Dean, you are director of release engineering at Shutterfly?

Dean Yu: Right.

Sacha: So I’d love to hear more about what you do there, but as importantly, you’ve been a member of the Jenkins community since 2008 and you’re part of the Jenkins government board. Right?

Dean: Yeah.

Sacha: So, a governance board. So, let’s talk maybe a bit about your participation in Jenkins first.

Dean: Sure.

Sacha: Two thousand eight, that’s a long time ago.

Dean: Yes.

Sacha: That was not even Jenkins back then.

Dean: It was not.

Sacha: That was Hudson.

Dean: Yes.

Sacha: So, tell us a bit more how you got there.

Dean: Sure. So at the time, I was working at Yahoo and I was on the Java platform team at Yahoo. So Yahoo formed a team of Java expertise because Java was becoming more and more used within Yahoo. Yahoo traditionally was a lot of C, C++ and PHP, but our ad systems were primarily a Java stack and they wanted – and then there were other teams that were using Java too. So they wanted kind of a platform to kind of unify all the different odds and ends. And so we were – that’s what I was working on. And it’s kind of the worst reason to actually pick something up, but, you know, one of the teams was working with Hudson and they said, “Hey, this might be generally useful. Should you be using containers and integration throughout the company, but, we don’t have time to support it. You guys are the Java platform team. This is written in Java. Why don’t you guys take a look?” And you know, that’s kind of how we picked it up. And as I was looking through it, we kind of have a process for taking external software into Yahoo, so we said, “Well, you know, there’s going to have these security problems. We have to integrate it with our single sign-on. We’re going to have to be able to support these, you know, Yahoo proprietary technologies.” And so what we really did was kind of we forked it. Typically whenever Yahoo forks something, they put a Y in front. So at the time, it was Y Hudson. And so we had to do all these things to integrate it with our browser based single sign-on, USAR, equivalent of drudes and make it work with our special kernels and our two-inch JDKs. And so I started popping up on the Hudson Dev List asking _____, whoever was listening, like, “Hey, I’m trying to do this integration. How do you do this? I’ve been in this chunk and it doesn’t make any sense.” And so that was kind of how we started with Hudson. We knew that since we were doing a fork, we didn’t really want to diverge. We’ve had experience forking and then not keeping up and –

Sacha: That’s hard.

Dean: It was horrible. So, it was a conscious effort. It was a very conscious effort to make sure that whatever we did that was not specific to Yahoo, we contributed back upstream. And then after a while, you know, Kohsuke kind of reached out to me. It’s like, “Hey, you know, it’s kind of interesting what you’re doing. Why don’t we get together and talk about what you’re doing and what you’re trying to do it for. Some of the stuff is kind of weird, but then I saw your domain name, so I kind of figured out why it was weird.” So, you know, we started doing interesting things with Hudson at the time. I started writing plugins for things that we need that could be open sourced. I worked a lot on the Subversion plugin because we were using Subversion at Yahoo. We did kind of a couple of extra features to be able to kind of filter what you’re polling on. And then our console logs were gigantic for some of the builds. So I wanted a way to be able to collapse it into sections, so I wrote a plugin for that. And then there was the first Jenkins user conference. Well, actually I guess there was a fork first, but then the conference. So they asked me like, “Come talk at it.” So we talked about what we were doing with Jenkins at Yahoo. Let me back up a little bit. When the fork happened, it was, you know, we were outsiders. We were not – we tried not to get involved in the politics, whatever was going on in the background.

Sacha: Everybody tries not to be involved when there is an issue with Oracle. Yeah. They have big feet. So –

Dean: Yeah. But it was obvious. You know? If Kohsuke is going to this way, that’s where the activity is going to – that’s where the activity is going to happen. So, that was our choice. It wasn’t a hard decision to make. Like, you know, if there’s a fork, we’re going to follow Kohsuke and the community. And so we did that. We still called it Y Hudson inside because we didn’t want to go through the effort of changing the package name.

Sacha: Oh, that’s funny.

Dean: Because it still has to be deployed and all that stuff. But then, you know, after a while with all of the turmoil and the turnover that Yahoo was going through, you know, I was kind of on my fourth CEO in a matter of years and we were trying to introduce CI at a company level and it was kind of going in fits and starts. And like, it’s enough.

Sacha: Right.

Dean: So, I was looking around. I wanted to find a job where I could still continue to work with the community, so that kind of needed someplace that used Jenkins or Hudson and I thought I wanted to go back to Apple at some point, but then they didn’t use, you know, Apple is Apple. They didn’t do any of this stuff. And then Shutterfly kind of came up and it was like, “This is exactly what I’ve already been doing. It’s a great fit,” and that’s kind of been that and now it’s been four years. I just had a four-year anniversary this week.

Sacha: So, tell us a bit more about what is Shutterfly doing because I’m not sure it’s necessarily clear for everybody, especially outside of the US.

Dean: Sure.

Sacha: And what’s your role there?

Dean: Yeah. So Shutterfly is an online photo and management and memory management company. We talk about enabling people to share life’s joy. We want people to take their pictures, kind of the typical how you keep your memories. We’ll store them for you digitally and then we want you to be able to put your pictures anywhere. You know, this coffee mug us _____ son and daughter.

Sacha: Oh. I’m looking at a nice mug right now with lots of pictures. Right?

Dean: Yes. And my vice is Phil’s Coffee. And so all these pictures were taken of my kids either at Phil’s or with a Phil’s coffee mug in there.

Sacha: Oh, that’s funny.

Dean: And so when I go there in the morning, they all recognize my mug.

Sacha: That’s great.

Dean: And so, you know, it’s uploading photos, being able to create physical products out of those photos. Photobooks are always our biggest seller. The holidays is our biggest quarter. People are sending our Christmas cards, holiday cards, but we have mugs, pillowcases. We just came out with this thing which is putting a picture – it’s portraits, but it’s printed on metal and it’s amazing. It’s like this kind of anodized aluminum and the colors just pop.

Sacha: Nice.

Dean: So we’re always looking for more things we can put pictures on. Some of the things I don’t get. I’m surprised how well they do, but primarily it’s – I guess I’m in the wrong demographic for that. Shutterfly as a company has been around for 17 years.

Sacha: Wow.

Dean: So it’s been around for a long, long time, so our tech stack is actually pretty old.

Sacha: And it’s mostly in the US or –

Dean: It’s US. Our market is primarily US. We haven’t really tried to expand internationally yet. Part of it is architectural. Part of it is we have enough to do in here. And we’re actually, you know, our headquarters is up at Redwood City behind Oracle and we have a second office, which is actually right across the street here.

Sacha: That’s funny.

Dean: So, I can park over there and walk here for the Jenkins Conference.

Sacha: Yeah, so for the record, here is Santa Clara Conference Center because this is actually being recorded during Jenkins World 2016. Yes.

Dean: That’s right. So, that’s what Shutterfly is. Shutterfly is a Java – Shutterfly is a Java stack. It’s a JSP, Tomcat, Java, Oracle. So pretty old stuff. We’ve been actively trying to modernize it and break things apart. I’m director of release engineering, so from when I joined, you know, all the releases were coming through my team. We had a monthly release cycle. Every month we had a major release of the website with all the features, all the new products that we wanted to sell. It required code coming from five or six different development teams and they all had to be ready at the same time.

Sacha: Oh, right, so it’s really a gating of everything and a _____ big bang, essentially.

Dean: Very _____. Yes. So, it was – you know, a painful process. If one is late, everybody has to wait and then as typical everything else after the development check in gets compressed. Right? And as we try to do more projects, as we do acquisitions, more and more is kind of falling on my team to try to figure out. And so I knew, you know, we kind of had to change our model for how we deliver software. We couldn’t just be that central leads team that bottlenecked everything. One, we couldn’t scale and we couldn’t keep up, but two, there was so much that was still manual that we had to fix. So that’s kind of been my big focus over these last four years is figuring how to keep the policies and controls we have in place for SOX compliance, for PCI compliance because we’re dealing with credit cards. We’re dealing with very personal information, but allowed developers to have that feeling of ownership all the way out to product, so try to figure out how to break that monolith up.

Sacha: But so, it means that sometimes release engineering is being perceived as a role that tends to be late in the process, but it seems, which it is, but it seems you’ve been very influential at modifying the process itself much earlier in the cycle on how – and almost like an architect, you know, of how software gets done. Right?

Dean: Yes. It’s – that’s very astute and that’s my background. Right? It’s like actually was finally an architect at Yahoo before I left, so I still put on that hat a lot. And you know, for me, it’s practicing continuous integration. Right? The feeling, the belief that every commit should ultimately potentially wind up in production. And that changes how developers think about what they’re committing and when they commit. We don’t want developers to wait a week before dropping a whole chunk of code on. We want them to kind of slice and dice and kind of be more fine-grained. So, you know, there are definitely challenges with that monolithic code base, the monolithic architecture. It’s very easy for one team to check in something that has an effect on another team, especially when everything comes together into this one test environment that we have. So, we have a lot of processes. There’s always this tradeoff between how fast you can go and how stable it is. Right? And so for a long time, we were really valuing stability over timeliness, but it took a long time to get things done. So I’ve been trying to find that middle ground. Part of it is breaking things apart so you can get faster feedback on your component, have that tested, then have that integrated and do coarser grain tests within that shared environment. So it’s been successful. We’ve had to do a lot with Jenkins, a lot to kind of beef it up. In my talk this morning, I was saying how when I started, we ended 2012, we did about 50 releases to production that year. And that was all done by my team. This year, through August, we are already at 300 releases out to production. And that’s because my team doesn’t do a whole lot of it anymore. We still do that monthly release. There’s still a lot of code that’s on that monthly cycle, but a lot of the things that have been broken out into services, they go when they’re ready.

Sacha: Right. Okay. So it means you’re – it seems like you’re on your way to a destination where maybe you remove completely this monthly release or you think that’s going to be an unbreakable wall?

Dean: I think it’s going to get to a point where it’s diminishing returns. This is something that I talk to my chief architect a lot about as we’re trying to figure out our two and three year roadmap. There are still a lot of low hanging fruit and a lot of things that can be broken out. And we have development teams that are doing that. And there are parts of the architecture that will be really hard to solve, really hard to actually just break apart. And it might – we don’t know yet. It might get to the point where we say this isn’t management to do on a monthly basis. Or we may say, you know, we’re just going to leave it behind and eventually replace it with something else.

Sacha: So as a SaaS, it’s typically a perfect scenario to do continuous deployment not just delivery, but truly deployment, and it doesn’t seem to be the strategy you’re aiming for even if you were to further optimize based on what you have. You seem to say that’s not necessarily where you want to go. Right?

Dean: Well, I think it’s – continuous deployment is a strategy that we can use and if you think about continuous deployment as using the same automation in every environment, even in production, we do that. Right? Developers – we have a bunch of fixed environments for development, for testing, for load performance testing, for user acceptance testing and all the way out to production. We deploy to all of them the same way. It’s completely automated. We have three updates of our test environment every day. It just happens. We don’t even actively worry about it. The only time we have to worry about it is if someone says, “Hey, we missed that 1:00 build. We have one more change we’ve got to get in. We’ve got to kick it off again.” It’s all scheduled through Jenkins, through triggers and things like that. When we get to production, we follow the same script, but we run each step manually because we have to interact with our operations. We don’t have root permissions, which is what’s required still to do some of the things in production. So there is that last part that we still have to do manually. So, continuous deployment is there, but for us, it’s really – it’s not the end goal. Our end goal is to break things out. For the things that are services, we do continuous deploy those. We have – like some of our creation patch, we’re actually editing and creating these types of products. They are now purely HTML5. They run completely on the client. They don’t need Flash or any _____ server. So, those are actually continuously deployed. They actually – those teams practice continuous delivery. They push everything out. It goes all the way out to production and they have a way to control the URL to say which version users get of their build.

Sacha: I see. All right. So it truly depends on the specific application. Right? It’s really ala carte. Okay.

Dean: Yeah.

Sacha: And all right. I had a question that just dropped from – oh yes. So, you were talking about low hanging fruit and which made me think, what metrics are you using to define what’s the low hanging fruit? Essentially, it seems like you’re trying to improve something What are the top metrics that keep you awake at night?

Dean: So, hopefully none because I don’t want to be awake in the middle of the night.

Sacha: Right, that’s a good point. Yeah.

Dean: I would say that, you know, again with – our main concern is trying to gain efficiencies around that monolith. We can certainly break more and more things out. So we have metrics around how long it takes for teams that have broken themselves out and kind of created services, how long it takes for them from check in to actually releasing that commit out to production. So we can do that and with Jenkins, we can see how often their builds are failing. We can talk about how many releases they’re actually doing. It’s kind of interesting. While we’ve provided the capability to actually do continuous deployment, continuous delivery, teams have kind of settled into a twice a week or once a week type of actually release where they actually switch the thing. It seems to be – because we follow a two-week sprint model, that seems to be kind of a natural cadence. You know, we do have a few teams which are truly practicing continuous delivery where they’re switching multiple times as they need to. But for the monolith, that’s kind of the thing that worries me the most because it’s so old and so difficult to work with. We’re talking about one code – one Java code base that is comprised of maybe seventy Maven modules in this gigantic Maven build. And it has to go to 14 different types of servers, so different groups of servers. In non-production, we have a smaller scale of that. Production, we’re talking about five or six hundred servers. Right? And it all has to go at once.

Sacha: Wow. Are you using the cloud?

Dean: We are trying to figure out how to use the cloud, but some of it is tied to our architecture and our storage. To be able to store all these photos, we actually have custom storage solutions. So we have very, very cheap storage and to say, just to move it into the cloud, our expenses go way up. So that’s kind of one of the big things we struggle with is like yes, cloud provisioning, elasticity is something we would love to get, but there’s that tradeoff of what do we do with our storage.

Sacha: I see. I see. Right. That makes sense. Yeah. Yeah, it’s a comment I hear. Sometimes companies can’t move to the cloud because – or for the obvious bad reasons, right? But sometimes you have some businesses that optimize for a very specific thing. Could be the cost of storage. It could be latency. Could be a number of things. And those represent much harder problems to solve to move to the cloud. So, what’s very interesting in everything you say, it’s really you’re focused on CI/CD, but as you said, you’re an architect at heart and that’s really visible in how you talk and how you solve problems. So the key word on everybody’s mind these days is containers.

Dean: Yes.

Sacha: So, is this having an impact at Shutterfly?

Dean: It is. So, last year, we switched our Jenkins to run as containers. So we did that one, so we can actually manage the incidents better and in the talk that I was giving at the conference today, I was actually talking about our setup and what we were trying to do to be able to recover from outages that we’ve had. As we do more and more releases every year, you know, these outages have more and more impact across the organization. An outage back in 2013, no one noticed. And outages we have this year, everybody notices. Right? And so we – the important thing is the data that Jenkins creates in the Jenkins time directory. So we have that on network file storage and so for us, it’s how quickly can we get a new head stood up with the right software or with the right plugins, with the right configuration that we can just map onto the network storage as a source of truth. As we break out software and we want to be able to create pipelines for all these different components, we don’t want to be the bottleneck to create the pipelines my RE team. So we’ve actually used Job DSL plugin to let developers – to find them a pipeline. So we’ve kind of got policies so that all the boilerplate is handled for them. They just have to declare, you know, this is our repo. This is our, you know, our team name. These are the notifications and just kind of the component name, if you will. So it’s pretty boilerplate stuff and we control everything else with global policies. And so what this gives – what this is giving us in terms of managing Jenkins is now if Jenkins goes down, and what I mean down, like not about it hands and we just have to restart. That happens and it’s not the end of the world. It’s if we have a disastrous loss of the hardware, how quickly can we recover from it and get everything back. And so containers have helped us do that. We treat everything in the containers as the source of truth, so the build metadata is on that network file store, but everything else, we actually restore from the containers every time Jenkins starts up. It actually helps enforce good discipline. Jenkins is really good with its UI. You can add a plugin here, add a job there. It’s one of the benefits of Jenkins is it’s so easy to get up to speed on. But as you get more and more of this stuff, it becomes that ease of use is kind of a double-edged sword. It becomes really hard to manage, really hard to trace. So by basically taking the idea of configurations code, we’ve moved all that stuff into version control and we will have a pipeline to actually produce the container that holds all the plugins. We’ll have a pipeline that produces the system configuration. We have a pipeline that produces the _____ that has the Jenkins software itself and we bind everything together. This has given us also the ability as developers define their pipelines, they can basically have a clone of exactly what we run in our production instance on their desktop with Docker machine to test their DSL locally before they commit it. So we don’t have a lot of broken jobs running on our production instance.

Sacha: So yes _____ keynote, _____ spoke about this project he wants to work on about essentially making Jenkins fully configurable through files. Right?

Dean: Yes.

Sacha: That seems to be talking –

Dean: There’s a lot of stuff that Jenkins – that Kohsuke talked about that I wish he’d talked about two years ago, to be honest. A lot of the stuff that we’ve – so, while I was listening to the keynote, I was thinking, “We’ve kind of already solved this.” But the problem is, we’ve solved it in a way that’s specific to Shutterfly. We’re using – all the things that we use are off the shelf open source components, but it’s wired for Shutterfly.

Sacha: He admitted to me that he secretly has a prototype working somewhere on his machine. So – but don’t repeat that.

Dean: Okay. So, you’re his boss, so - So that’s, you know, containers have helped us in terms of managing Jenkins. Where the real benefits for the organization though are built – is our build farm. What we’ve had to do before with our static slaves is any time, you know, especially as we’ve kind of broken things out, you know, they want a special tool or use a new technology for their build or their project and that means we’d either have to install that software onto our build slaves or give them a special slave. So that’s where containers have really helped us is now we actually allow them to specify their build environment as a Docker image and we’ll actually just run it on our cluster. So our slave cluster right now is, you know, eleven physical machines. You know, we just get these from our operations guys because they manage our physical infrastructure anyway. So they’re pretty beefy. And it’s more than enough capacity to just run these jobs and spin up and spin down these containers as we need. And so it’s been really easy for developers to be able to say, “I’m going to use Java 8 instead of Java 7. And we don’t have to worry about having – taking the time to make sure that that’s available for them.

Sacha: What about production?

Dean: Production? We are not there yet. Right now, I would say that our Jenkins instance is the mature – the most mature user Docker of containers within the company. So a lot of other teams are actually looking to us for guidance about how to use Docker in production and we’re working with our operations team on that. Part of the thing I think is there is still a lot of confusion about containers versus virtual machines and we’re still kind of going through the process of teaching people. You know, you’re not just – you don’t really just dump everything that’s on this physical host into a container image and say, “That’s it.” We’re trying to figure out what the best way is to break things up into different containers that we can wire together. For me, Docker – there’s actually two parts of Docker. Right? There’s the software description and definition and deployment and then there’s the scaling of the deployment. And for us, I think the first baby step that we have to take is to actually just be able to model our software as containers and just kind of run it the way we do today which is still a host with the set of processes. And if we need more, we’ll have to add more hosts and run more containers on those hosts. I think just being – just jumping directly to the next step where everything’s in the swarm, I don’t – I think there’s work we have to do in our architecture before we can do that.

Sacha: Right, so you’re really using Docker as a – almost like a unit of packaging.

Dean: Yes.

Sacha: A cleaner way to do things, but not yet, you’re not into the _____ versus Mesos, versus Swarm and letting things orchestrate themselves on the data center.

Dean: And I think even for the use case that we have where Docker is solving a lot of problems for us because we’ve had this issue where there’s a layer of software that kind of sits between the application and the system and it’s a really grey area about who’s responsible for that.

Sacha: That’s very true, yeah.

Dean: And so for – with Docker, we can say, “All right, my team’s going to own it. We’re going to control it. We’re just going to put it into a container and we’re done with it.”

Sacha: Yep. Yep. All right. That’s interesting. And so in terms of resistance to change because you’ve done a lot of changes, especially as you break those applications into smaller applications, I know developers sometimes have this feeling of ownership and territory and so on. So can you talk about this resistance to change?

Dean: So, I think that there is – at Shutterfly anyway, it’s not so much resistance as lack of opportunity. Because the nature of our business where it’s so seasonal, everyone’s focused on making sure everything is ready so that our Q4 goes smoothly. And that leaves very little room – historically it’s left very little room for us to make investments to be able to modernize, to be able to move forward. And so it’s got – it had gotten to the point where it’s actually really painful to make changes because our build time took an hour to build this model _____. Then it took another hour to deploy it somewhere. Then we had another two weeks of qualifications. So it’s a very painful thing. So, giving development teams the opportunity to kind of break their stuff out and kind of be out of all that delay has really empowered them and they were ecstatic to jump aboard that. It hasn’t been all kind of peaches and cream. It’s a learning experience for them as well because now they are dealing with things that they’ve never had to deal with before. “What do you mean I’m going to get paged in the middle of the night for this thing?” “What do you mean that this other piece of code that I’m depending on has changed without me knowing it?” So there’s still a lot of monolithic thinking within the organization, even though we’re changing the way that we deliver and even develop software. So that’s something that’s still in progress.

Sacha: All right. And I suspect Shutterfly must have a mobile application?

Dean: Yes.

Sacha: Right? So when did you start with the mobile application?

Dean: Mobile for us came through an acquisition and so for a long time, we kind of left them alone because they were kind of fairly separate. They’re a totally different stack. And they’re actually able to use the cloud because they we reusing Heroku and all these more modern things. They were kind of a startup. This year – but as the years have gone on, we recognize that more and more of our revenue, of our business actually comes from mobile instead of the standard desktop site. So this year, we’ve actually made a big, big effort, it’s our biggest project of the year. We call it integrated commerce, to actually unify the commerce that’s on the website versus what you get in mobile. And we actually – it just went live last week. I believe it’s last week. Hopefully I haven’t pre-announced anything. And another part of that is, you know, we’ve done a very serious upgrade of the photo management part of our site too. If you are a long time Shutterfly user and you go upload your photos and you kind of look at your albums, it’s a very 2005 look. It’s kind of these thick lines. It almost looks like it’s Flash because part of it was Flash. Very slow reloads, full page reloads. It was not dynamic. It was not kind of what you’re used to in a modern webpage. So, for the last year, year and a half, we’ve been – we call it Shutterfly 3.0. It’s a brand new site, way of managing your photos. So that is now completely integrated with our mobile – with our mobile application too. So, in some sense, we’re kind of competing with Facebook and iCloud photos in that we want your photos in our cloud, in our storage, not anywhere else because then once we have your photos, it’ll be really easy to create those products. You know, we have a team in Haifa in Israel, so that actually presents its own challenges too, but they’re doing a lot of kind of, you know, image recognition, facial recognition, so they’re able to kind of pick out your best pictures out of the things that you upload and kind of say, “Here’s how your pictures would look on a mug. Here’s an auto created photobook for you,” to kind of really make it easy to do the stuff. And you need that on mobile because you’re not going to want to actually create a photobook on your little phone. You’re not going to want to sit there and position your photo just so, you know, so it lines up here. So that’s a big part of our mobile strategy.

Sacha: Right. And so do you feel like mobile forces those teams to work differently or are they being pushed to be more agile than other teams by virtue of being mobile or do you think it’s kind of all of the same anyway?

Dean: It’s all of the same anyway. I think what it’s been pushing is because the mobile team itself has been kind of separate for a long time, so they were always able to go at their own pace. But now with this big integrated commerce project, it’s forced a bunch of our existing development teams, especially the teams that deal with the commerce, to think about the fact that they’re going to have clients that are not part of that monthly release with them anymore. It may go ahead, it may fall behind. So it really kind of forces them to think about their APIs that they’re providing and not breaking compatibility and not assuming that everyone can upgrade at the same time.

Sacha: I see. I see. Another topic that comes up a lot these days, maybe it’s who I’m speaking to. Who knows? So I was wondering if you had an opinion, what about the serverless architectures, the so-called serverless architecture. What’s your take on that?

Dean: I think it’s a lot of hype. I’m not quite sure where it’s at yet, because ultimately, someone has to have a server somewhere. Right? And the way I think about it is like, yeah, sure, you don’t have to run your own servers, but it means Amazon’s running your servers or Google’s running your servers or Microsoft is running your servers. I think there are principles. There are principles coming from infrastructures code configurations code that makes that possible, but regardless of who runs your servers and who owns your servers, that’s going to help you out anyway. So, from my point of view, the way we do deployments, one of our big pain points right now is the infrastructure setup is still very manual by our operations team. That’s not something that we’ve figured out how to automate yet. Cloud can certainly help us with that, but for us, it’s getting that automated, getting all that captured, you know, in version control so we can replay it. To me, that’s kind of the more important thing and then once we have that, it becomes less important whether we do it to our own infrastructure or whether we do it to a cloud provider or things like that. The whole idea of bringing, you know, of spinning instances up to run code for a little bit and tearing it down, that’s a problem that we’re not even worried about.

Sacha: Right. Yeah. With the scale you have, I think it’s – it’s just a _____ issue anyway.

Dean: Right.

Sacha: Yeah, yeah. So we had CI, CD, _____ extension, a lot of companies, the so-called unique core and new companies deployment as well. What do you think is next? How do you see this space evolving for always more agility, always more, you know, yeah.

Dean: Yeah. I think it’s interesting that we talk about DevOps and we talk about this type of agility, but from what I’ve seen of it at Shutterfly and a lot of other talks, it’s still very focused on the technology organizations. We have to bring business into the fold. It shouldn’t be DevOps. It has to be BusDevOps. I think Gene Kim said the same thing because right now, there’s – we talk about breaking down the walls between developers and operations and even release like it should be one continuous thing, but there has to be – and we talk about giving feedback on the business value, but the business has to see the problems that happen too and right now, they don’t see that. They just know, you know, the site went down overnight. And they were kind of merrily sleeping through it and afterwards, they want to know what happened, “What are you going to do to make sure that it doesn’t happen again? How could you let this happen,” type of thing. I mean, but what they don’t think about is, you know, sometimes it’s their requirements that cause it. You know, it’s last minute changes that people try to accommodate because it’s good for the business. You know, they have skin in the game too. So one of the things that I’m trying to convince our business team to agree to is when there’s a slight downtime, it doesn’t just page operations and development teams. It pages them too. They need to be – they have to sit at the table to understand what’s actually happening. They need to understand and see –

Sacha: I’m sure they’ll be very happy.

Dean: I’m sure they will be ecstatic. It’s kind of ironic because on the one hand, everything that I’ve done so far is to make sure that my team does not get paged. But now, talking about like everyone else should get paged. So, we’ll see how that goes.

Sacha: You know, I like the fact you’re talking about that because I always – that’s why I tend to speak more about continuous delivery than DevOps in some cases because I feel like it’s easier to extend the notion of continuous delivery and the feedback loop to include the business. But it’s true. I see a lot of organizations very focused on optimizing the way IT works, but it’s like, you had the black box. It used to work before and now it works better. But fundamentally, it did not change the equation of the business. And you need to include the business. And the business wants it. That’s what they say, actually, but they’re all – it’s also very comfortable to be on 12, 18 release cycle because you have those months _____ cycle. You know, you have those big strategic decisions. You shift your idea to IT, they work on it and you know, it’s relatively peaceful from that standpoint. Once you start telling them, “Now, you’re going to have bimonthly feedback loops and we need to potentially change plans based on this and so on. That puts a lot of constraint on the business as well. Right?

Dean: Absolutely, and I think depending on the type of people that you have on your business side, some might be technical, some might be not, but they may not be able to understand what’s happening. For a long time, Shutterfly was very focused on revenue, again, because of the seasonal nature of the business, we had to make sure we hit those revenue numbers so we could actually have money for the following year. But, so all the projects were like, you know, these are the types of things we think will make money in this Q4. They don’t care how it got done, but then that just meant that the development teams had to pile on more and more technical debt and it gets harder and harder. But on the other hand, the development teams, they don’t stand up and say, “We need the time to properly invest in this so we can do not just this thing, but the next thing.” And so I think it is – it has to be a conversation that happens both ways and I think both sides can do better at it.

Sacha: Do you think it requires – to be able to bridge that gap, it requires the use of _____ kind of smarter analytics because today, I feel like the IT analytics are very focused on technical numbers. But at the end of the day, it’s revenue. Right? So, if you were able to capture revenues that get generated on one piece of code versus the others, that kind of thing, it could help evolve the business?

Dean: It’s actually interesting. We have that. We actually –

Sacha: All right.

Dean: Because our – because of our seasonality, our patterns are actually very predictable. So we could actually say – we can actually predict which days in Q4 are going to be our peaks and how much money we expect to make on those days, and we’re usually very, very close. And so our business people do rely on that. They can actually look, you know Q4 is actually very interesting for them too ‘cause it’s like, are we making our money? Are we meeting our targets? So they can tell, you know, we’re a little bit short on this day. You know, and then they try to figure out what happened. Maybe we have to do more promotions. Maybe we priced this too high or too low. Maybe we did a bad projection of how popular this particular product is. But, to them, it stops at that revenue number. You know? They don’t think – if, for example if we kind of cap that in our capacity, that kind of limits how much money we can make. But then that gets too far below what they can understand. Right? So there’s still that middle ground that we have to be able to translate.

Sacha: But, say what you could do, you could say, “Okay, we’re going to work on the new features you ask, fancy shopping card.” So you work on this, but what you could do is do AB testing. So you deploy the new version to 50 percent of your customer and after a week or a month, you say, “Well, with the old version, we made a million. With the new version, we made 900k. We made less, so maybe it’s time to roll back.” And maybe that type of discovery that includes revenue numbers would help. Do you do some of those?

Dean: We do some of that. Our technology is not at the point where we can do that across the board. For some of the more modern pieces, we can do that. So we can actually test, we can experiment on different creation paths and how that affects monetization and conversion. So we do some of that. And as we kind of break stuff up, we have the flexibility to do more of those experiments. The challenge has always been when it was a change in the Java code that was in this monolith and it required restarting all these servers, it was very, very hard to do those types of tests.

Sacha: Right. Right. Okay. So a lot of people here today at Jenkins World are actually from companies where they don’t do much DevOps yet. You know, they all want it, otherwise, they wouldn’t be here. They face lots of different _____ of those technical, political and so on. What would be your advice on, you know, you’re talking to a person that wants to initiate change, but they feel resistance internally. What would be your advice as to that person?

Dean: I think it’s kind of like that quote from Gene _____. It’s like, it’s not the change. It’s not the change for change’s sake, but where are you trying to get to? Kind of having that vision and an expectation of what the outcome you want to be and it’s not about releasing more often. It’s not about more releases. Within Shutterfly, we actually talk about one day, we don’t want to talk about releases at all. It’s just, you know, experiments. It’s just – a code change is experiment. It works, you turn it off – if it doesn’t work – if it doesn’t work, you turn it off, if it works, you roll it all the way out. Right? But you have to be able to talk in terms of outcomes and not activity. You do activity to achieve an outcome. And I think I’ve had a lot of conversations with a lot of people at the open source hub. They ask me, it’s like, “Well, would you use Jenkins this way or would you use Jenkins this way?” Well, you know, “How come you’re not using Pipeline? How come you’re still using DSL?” And for me, it’s about the outcome that we’ve gotten to. I wanted to be able to turn around releases. I wanted to be able to allow teams to be able to do their own releases. I wanted my team not to be the bottleneck. Those are outcomes that we did all this activity to support. And people talk about dashboards. People talk about, you know, visualizations. I’m like, well, you know, again, what’s the outcome of that information that you’re trying to radiate out? If it’s just people talked about, you know, I want to see a giant dashboard with all my pipelines. Like, whether they’re – every pipeline is green or red. I’m like, well, what’s the outcome? What is a green pipeline telling you? Nothing because it’s green. So we don’t show that. So, it’s a lot of thinking about, you know, what it is that we’re trying to achieve, what information people need to achieve that goal and then working from that to what are the steps we actually need to accomplish. And I think for my advice to people, thinking about continuous, even from continuous integration to continuous deployment, to continuous delivery to DevOps, it’s like, you know, what’s the outcome you’re trying to achieve within your organization and then find the practices, the processes, the tools that best suit that.

Sacha: Right. Okay. Anything else you’d like to share with us? The wise words of Dean.

Dean: Speaking to CloudBees specifically, I think you guys are doing a great service. You guys, in a certain sense, are kind of stewards of Jenkins now because you’ve got Kohsuke, you’ve got so much of the Jenkins expertise working for CloudBees now. I would say from the perspective of the governance board member that doesn’t work for CloudBees, you know, thank you for trying to cultivate the community. Thank you for bringing all of this together. But I think that one of the things we have to be careful of is, you know, the open source project has to stand on its own to support you, but to be able to support any potential other vendor that might want to do something similar. And you know, you guys have certainly been very conscious about that, very deliberate in giving the community that freedom and I hope that we continue to have a really productive partnership.

Sacha: Thank you, Dean. I’m going to say that it’s something we talk a lot about internally. Maybe from the outside, it’s not very visible because, well, you never know what happens within an organization. Right? But I can tell you, it’s not a topic we once discussed and never talked about anymore. It’s something that comes up very frequently. We’re trying to be very conscious about this church and state separation. And we’re always listening and we know that as soon as you have more than one people in a room, it’s hard to satisfy everybody, but we really try to make our best to do that. I was at JBoss before. It was a very different approach to open source and communities. I then joined Red Hat where it’s very different as well and I think we’ve been able to – I hope _____ struck a good balance between church and state and I’m not saying, by the way, _____ versus church or state. That’s always a thing you don’t want to say.

Dean: Yes.

Sacha: And yeah, I hope it’s going to stay this way for the good of the community and obviously for the good of CloudBees.

Dean: Yeah.

Sacha: Thanks Dean.

Dean: Thank you.

Sacha: Thanks a lot.

Dean: Yep.

Sacha: Bye-bye.

Andre Pino: Like what you’ve heard today? Don’t miss out on our next episode. Subscribe to DevOps Radio on iTunes or visit our website at CloudBees.com. For more updates on DevOps radio and industry buzz, follow CloudBees on Twitter, Facebook and LinkedIn.

Sacha Labourey

Sacha was born in Neuchâtel, Switzerland and graduated in 1999 from EPFL. It was during Sacha’s studies at EPFL that he started his first consulting business - Cogito Informatique. In 2001, he joined Marc Fleury’s JBoss project as a core contributor and implemented JBoss’ original clustering features. In 2003, Sacha founded the European headquarters for JBoss and, as GM for Europe, led the strategy and partnerships that helped fuel the company’s growth in that region.

Related Content