Your company probably isn't Facebook. Statistically speaking, this is obviously true. Only a fraction of those reading, if any, work at Facebook. But I'm also talking more philosophically, from a technical capability standpoint. Perhaps you work somewhere that views IT and software as a cost center. Or maybe you ship software products, but on a much smaller scale. Whatever the differences, you probably don't face the size, scale, and scrutiny that they do. And, on the flip side, you also probably lack their resources. But you don't need to be Facebook to ship software like Facebook. What I mean is, you can learn from them and adopt their techniques. Elements of their approach are, in fact, perfectly suited to your world. You just need to adjust some of your views of what's feasible and some of your approaches. Let's take a look at some lessons you can learn about shipping software.
Use Feature Toggles to Have More Production Flexibility
I recently took a look back through the history of software releases. Early in my career, we mailed out CDs, and this eventually graduated to web deployments, remote update capabilities, and cloud-based architectures. Underpinning much of that progress is a fairly basic concept, and we should start there. I'm talking about the idea of a feature toggle. A feature toggle is just a little bit of conditional logic that lets you turn something on or off in production. Seriously, that's it. We developed this capability back in the days of mailing out CDs, using Windows INI files to let users beta test new features. "Do you guys want to try a new feature?" we'd ask customers with the software. "Just go into that INI file and change NewFeature=no to NewFeature=yes, and you're all set!" The feature toggle followed the IT world into web applications, then mobile, and then the cloud. And Facebook took advantage of it in spades as it did so. Get new features to market quickly and use feature toggles as a risk mitigation and rollout strategy. You could code such a thing yourself, the way they did. But in this day and age, feature toggles systems exist, and you should take advantage. Feature toggles not only reduce risk, but they allow you to ship at much smaller granularities, making releases much less eventful.
Split Test New Functionality to See for Yourself What Works
Once you have feature flag management in place, you can start doing some additional, sophisticated things with it. One good example is split testing, or A/B testing. The concept is pretty straightforward. Let's say you write software for an e-commerce site and you're about to push a new shopping cart feature to production. Should the shopping cart be red or blue? Decisions, decisions. You could debate it internally, getting into heated and animated discussions. Or you could do a lot of deep research into the psychological properties of red and blue imagery. But you could also just run an experiment. Ship both the red and blue carts to production, show each to half of your users, and see which one triggers more purchases. Facebook and similar companies take advantage of this technique, and so can you. In fact, Facebook supports this concept strongly enough to let you do it yourself, with Facebook advertising.
Tier Your CloudBees Feature Managements Instead of Doing Big Bang Deployments
Another idea that builds on the back of feature toggles is the idea of a tiered rollout. Facebook specifically talks about this particular technique in this post.
Each release is rolled out to 100 percent of production in a tiered fashion over a few hours, so we can stop the push if we find any problems.
This means that they take a given feature they want to promote into production and initially show it to only a small fraction of users. If that goes okay (more on what "goes okay" means shortly), they increase that percentage. As long as they encounter no issues, they continue doing that until the entire user base sees the new feature. The same way that you can easily implement feature toggles, you can also take advantage of this, even if you're pushing features to mobile apps or other non-web contexts. It'll take some adjustments to how you plan and handle your releases, and you'll need some tooling to help you, but this is very attainable for everyone.
Automate Your Tests. Everywhere
If you'll recall, I just promised to talk more about what it means for everything to "go okay" when you're deploying new functionality. Generally speaking, "goes okay" means "passes all sorts of automated tests." Part and parcel with rapidly speeding up the cadence of released functionality and with making releases much finer grained comes automated testing. And it absolutely has to be automated. In the article I linked about Facebook's rapid releases, they talk about pushing 10,000 diffs to production in a single week. Can you even begin to fathom the man hours manually testing that would require? In order to get to the point where you ship to production with serious cadence, you need to dramatically ramp up your automated testing. Of course, you need unit tests for your codebase. But you also need integration tests, end-to-end tests, and acceptance tests besides. And you're not finished there. On top of the standard tests for correctness, you also need things like performance tests, load tests and smoke tests. These tell you about the holistic behavior of your application in production-like conditions. But executing tests in staging doesn't finish your job. You also need actual tests in production as well, making sure things go smoothly after you push the functionality live. After all, do you think sites like Facebook or Netflix just have a staging environment lying around that does anything remotely like simulating their prod environments? That'd either be impossible or so expensive as to bankrupt them. So if you want to release software like these companies, you need to build automated tests early, often, and everywhere.
Use Anomaly Detection to Prevent Weird Disasters
Automated tests are absolute table stakes, but even they don't cover everything. Consider the following tale I once heard. An e-commerce application pushed an update to production. It passed every unit test, integration test, and every other test imaginable. It ran through everything I detailed in the last section without any issues whatsoever. And then it landed in production and everything blew up. Actually, everything blew up only metaphorically speaking. In reality, nothing happened. And I mean nothing quite literally. The update went live and their purchases dropped immediately to zero and stayed there. Nobody was buying anything. At all. What was the issue? It turned out that an accidental change to a CSS setting had made the background of the "buy" button the same color as the background of the page, effectively hiding it. There was absolutely nothing wrong with the site technically.... except that it completely didn't work as an e-commerce site anymore. Anomaly detection is how Facebook and similar companies combat this. Automated checks for things like, say, "Did our rate of purchases plummet immediately after this release?" In the general sense, you put automation in place to detect anything out of the ordinary so that you can look into it further if it occurs.
These Things Are All Sophisticated But Accessible
If you take anything from this post, take away the lesson that you can do all of these things in any kind of shop. Facebook is doing them on a scale that would probably dazzle you and with a level of sophistication that would make for fascinating conference keynote talks. But you need neither dazzle nor keynote sophistication to improve your own deployment pipeline. Set sensible goals for improvement, and force yourself a little outside of your comfort zone. Force yourself to deploy more frequently or in smaller granularity. Divert some resources away from manual testing efforts, forcing your hand at automation. Resolve to run more split experiments in production. And then, alongside these goals, adopt the techniques that Facebook and others have pioneered. You may not hit the conference circuit with your results, but you'll get things to market quicker and have happier users.