Why and How You Should Test in Production

10 min read

When developing and releasing software, the standard workflow is to build, then test locally, and finally release the software through a deployment. Usually, any kind of testing is separated in lower environments, with the expectation that production is reserved for customer use only. But what about doing testing right in the production environment?

For some, this is going to sound strange. Testing in production? Isn’t that reckless? Of course, it can be if you don’t use best practices—but companies such as Netflix and Facebook apply these strategies to safely deploy functionality to millions of subscribers. How do they do it? By adopting a feature flag deployment strategy. This allows for malleable deployments with more granular control in software functionality, so they need less lead time in getting features into production.

If your team is testing in production with the help of feature flags, you can also learn from real user behavior and feedback and course-correct quickly (if necessary) in a repeatable and automated way—leading to better customer retention. Testing in production is a key component of progressive delivery, a newer iteration on the concept of continuous delivery that takes the focus on release speed and quality and shifts it from the code level to the feature level. By focusing on features and testing in production, you can de-risk your deployments even further. 

To begin the journey to test in production, you need a fairly mature deployment strategy. It can require an initial buy-in of using a flag management tool, especially if one isn’t already in place. This article will talk about the major pros and cons of testing in production as you consider whether it’s right for your organization. After that, we’ll walk through the main steps to help you understand testing in production for yourself.

First, let’s look at some of the benefits of testing in production.

Benefits to Testing in Production

Provides Greater Testing Accuracy on Deployments

When it comes to testing new functionality, there’s no better way to do so than testing in the same environment that’ll be used. This is especially true when doing testing in lower environments, which tend to come with either non-exact data or differences in configuration. This can result in erratic production deployments when transitioning between environments.

If you test in production, you’ll have confidence knowing your users will experience the same functionality that was verified in testing.

Enforces More Frequent Deployments

To be able to test in production, your team will most likely need to change their general mindset of how deployments work. Gone are the days of months between deployments, with many features included in each deployment. These kinds of deployments inherently bring a lot of risk, and many times end up with the “good enough” cause, finding bugs early on that take months to patch in the next deployment.

With more frequent deployments, you’ll be more agile. You can react to customer demand more flexibly, deploying changes as needed. Frequent deployments allow for flag-driven development, meaning development will be done with the mindset of having a feature flag that turns on the functionality when appropriate. Done correctly, this means a developer doesn’t need to worry about functionality “leaking” into higher environments, even if the functionality isn’t complete.

As we’ll address in the next section, using feature flags enhances this capability even more. They allow you to change functionality without deploying.

Allows for Seamless Transition Between Testing Phases

Testing in production expands the definition of testing. Instead of testing being a “does this work or not” scenario, it can also mean experimenting and learning how users respond to a particular feature.

For example, we may want to try a different UI. We can use the testing in production methodology to first test functionality via QA. After verification, we can use analytics tools to conduct A/B testing and collect data on real customers’ responses to the new functionality. (Consider managing your feature flags and your analytics independently—you’ll be able to integrate your flags with the analytics tools you already use and get the best of both worlds. We call this the “embrace don’t replace” philosophy.) 

All these benefits are massive, but of course, adopting a strategy like this also comes with challenges. Let’s look into some of the difficulties you may need to overcome in order to test in production.

Challenges in Testing in Production

Security Risks

When dealing with testing in production, security can be one of the most thorniest issues to address. We’re no longer only working with dummy data, but real live data. This reality increases the necessity of handling data appropriately, meaning we must be careful when we test using existing customer data. Data protection regulations are often severe. Some software requires HIPAA compliance and comes with extreme costs for violations. Others, especially in the financial industry, contain plenty of personally identifiable information (PII). Data leakage can result in major lawsuits and much worse.

Consider limiting the amount of tools that have access to this data—the more places this PII is stored, the worse potential consequences may be for your organization. A tool such as CloudBees Feature Management provides this separation out-of-the-box by not requiring user data to be stored at all—in fact, it’s impossible to do so with its architecture. This makes it easier to comply with enterprise security requirements, and to protect your organization and its customers.

Required Maturity of Deployment Capabilities

Testing in production requires a fairly mature deployment process to already be in place. 

First, you’ll need to have the means to make deployments quickly. You will have already moved away from heavily manual deployments, as these are riddled with risk due to inconsistency. Next, as mentioned above, you’ll need to make more frequent deployments to help safely introduce feature flags into your application. You need to have the means of using feature flags dynamically—so not only turning features on and off, but also allowing for changing functionality based on the user. Finally, these same flags need to be in place in case a feature flag needs to be turned off gracefully.

Managing feature flags at scale requires good governance in lockstep with your continuous integration/continuous deployment (CI/CD) pipelines—which is also essential if you want to test in production. Maintaining an enterprise-wide window into feature flag states is critical. Check out some recommendations for integrating feature flags into your pipelines with shared governance.

If you attempt to test in production without the appropriate capabilities and safeguards in place, a few things can occur:

  • Bad data can populate your production databases and, without a rollback plan, can require manual intervention to fix data. This issue poses risks from a security and functionality perspective.

  • There might be unintended consequences of malfunctions or even unintended downtime for external applications. Fixing these side effects requires either using feature flags to change functionality based on request or load balancing multiple systems with different functionality, akin to canary releasing.

This might seem intimidating at first glance, and it might be even harder to get buy-in for a group behind the curve on modern deployment strategies. The difficulty comes in not only implementing these features, but also investing in the culture of progressive delivery. Given these two potential issues, it’s generally worth looking into a solution that can help the technical implementation, such as CloudBees Feature Management.

How to Test in Production

Finally, after reading about the pros and cons of this approach, let’s go over how to start applying it in your software development process. Getting started with testing in production really just revolves around two things:

  • Frequent deployment to the production environment

  • Granular activation and release of features to a growing number of users

Let’s go over the major factors in achieving each of those.

Frequent Deployments With Feature Flags

The first major step in starting to test in production is more frequent and smaller deployments. These deployments should come alongside feature flags that control activation in the environment.

Something to consider on deployment frequency: with feature flags, you can safely deploy “incomplete” code. You can deploy out without needing to worry about the functionality being completely present.

Granular Feature Flag Activation

Finally, we’re at the actionable part: actually testing in production. To do this, we start by using the feature flags deployed out alongside the functionality in those deployments. We can not only just turn features on and off with those feature flags, but we can also control the user base with access to said features.

As a first example, consider deploying a brand new feature. Using a feature flag, we deploy into production with functionality turned off to all users. Once we’re ready, turn on the feature flag to a subset of users that relate to an internal QA team. This team can perform manual testing to verify everything is in place. Once that’s done and we’re confident everything is working, we turn on the feature flag to all users. Since the internal testing was done in the same environment as what the users are using, our confidence that the users will have working functionality is improved.

As another example, we will consider the example of releasing a new UI, which replaces a workflow for something, such as filling out an application. We release the functionality using a feature flag but do so with a 20/80 split, randomly giving 20 percent of the users the new UI. We can record the data surrounding user satisfaction (surveys, speed in completion, and other input) and change the ratio incrementally, depending on user reception. Assuming users respond well, we’ll transition to the new UI completely. 

Or, maybe the UI isn’t received well. This outcome is unfortunate, but that means we just turn the feature flag to convert back to the old UI and go back to the drawing board. At first glance, this may seem like a waste of time, but consider that the negative response came early in the release of said UI, as opposed to further down the road. Having the negative response appear later, after a full release, would mean even more lost time. Customers could lose confidence and your team might not have the resources to quickly revert back to a better solution.

Testing in Production: A Milestone on the Road to Progressive Delivery

Although it may seem like a lot to take in, moving to a testing in production workflow will give your team the edge in releasing stable software and delivering functionality to customers more efficiently and reliably than your competition. If you’re interested in beginning to test in production but feel your team has a lot of ground to cover, consider using CloudBees Feature Management to quickly get up to speed.

This post was originally written by Dave Farinelli and first published on August 12, 2020. Kara Phelps has since updated it for freshness and continued relevance.

Additional Resources

Stay up to date

We'll never share your email address and you can opt out at any time, we promise.