The Why and How You Should Test in Production

Written by: Kiley Nichols
9 min read
Stay connected

The following is a guest blog post written by Dave Farinelli.

When developing and releasing software, the standard workflow is to build, then test locally and finally release the software through a deployment. Usually, any kind of testing is separated in lower environments, with the expectation that production is reserved for customer use only. But what about doing testing right in the production environment?

For some, this is going to sound strange. Testing in production? Isn’t that reckless? Of course, it can be if improperly applied. However, companies such as Netflix and Facebook adopt these strategies, allowing them to safely deploy functionality to millions of subscribers. Using a feature flag deployment strategy allows for malleable deployments that provide granular control in software functionality, providing less lead time in getting features into production, which leads to better customer retention.


Adopting a way to test in production requires a fairly mature deployment strategy and can require an initial buy-in of using a flag management tool, especially if one isn’t already in place. This article will talk about the major pros and cons to considering testing in production. After that, I’ll walk through the main steps of testing in production for yourself.

First, let’s look at some of the benefits of testing in production.

Benefits to Testing in Production

Provides Greater Testing Accuracy on Deployments

When it comes to testing new functionality, there’s no better way to do so than testing in the same environment that’ll be used. This is especially true when doing testing in lower environments, which tend to come with either non-exact data or differences in configuration. This can result in erratic production deployments when transitioning between environments.

If you test in production, you’ll have confidence knowing your users will experience the same functionality that was verified in testing.

Enforces More Frequent Deployments

To be able to test in production, it requires a change in the general mindset of how deployments work. Gone are the days of months between deployments, with many features included in each deployment. These kinds of deployments inherently bring a lot of risk, and many times end up with the “good enough” cause, finding bugs early on that take months to patch in the next deployment.

With more frequent deployments, you’ll be more agile to react to customer demand, deploying changes as needed. Frequent deployments allow for flag-driven development, meaning development will be done with the mindset of having a feature flag that turns on the functionality when appropriate. Done correctly, this means a developer doesn’t need to worry about functionality “leaking” into higher environments, even if the functionality isn’t complete.


As we’ll address in the next section, using feature flags enhances this capability even more, allowing for changing functionality without a need of deployment.

Allows for Seamless Transition Between Testing Phases

Testing in production allows in itself expands the definition of testing. Instead of testing being a “does this work or not” scenario, testing expands to include “testing” the response of a particular feature.

For example, we may want to try a different UI. We can use the testing in production methodology to first test functionality via QA. After verification, we can use real customers alongside A/B testing to collect data on the response to the new functionality.

These benefits are great, but of course, adopting a strategy like this comes with its challenges. Let’s look into some of the challenges to overcome to test in production.

Challenges in Testing in Production

Security Risks

When dealing with testing in production, security can be one of the most challenging things to address. We’re no longer only working with dummy data but real live data. This reality increases the severity of handling data appropriately, meaning we must be careful when testing using existing customer data. Depending on your application, the severity of data protection changes. Some software requires HIPAA compliance and comes with extreme costs for violations. Others, especially in the financial industry, contain plenty of personally identifiable information (PII). Data leakage can result in major lawsuits and much worse.

A consideration here is to limit the amount of tools that have access to this data. The more places this PII is stored, the worse. A tool such as CloudBees Feature Flags provides this separation out of the box by not requiring this data to be stored for use. In addition, CloudBees has the security of the tool vetted out.

Required Maturity of Deployment Capabilities

Testing in production requires a fairly mature deployment process to already be in place. First, you’ll need to have the means to make deployments quickly—meaning moving away from heavily manual deployments, as these are riddled with risk due to inconsistency. Next, as mentioned above, you’ll need to make more frequent deployments to allow for safely introducing feature flags into your application. You need to have the means of using feature flags dynamically—so not only turning features on and off, but also allowing for changing functionality based on the user. Finally, these same flags need to be in place in case a feature flag needs to be turned off gracefully.

Achieving granularity with the feature flags being used across the board makes having a positive user experience in managing feature flags even more critical. A few points of importance in this area include having a good user interface for toggling specific feature flags and having the means for control across an enterprise. The latter becomes especially important in a large company as feature flag deployment becomes more widespread.

There are a few things that can occur if you are attempting to test in production without the appropriate deployment capabilities:

  • Bad data can populate your production databases and, without a rollback plan, can require manual intervention to fix data. This issue poses risks from a security and functionality perspective.

  • There might be unintended consequences of malfunctions or even unintended downtime for external applications. Fixing these side effects requires either using feature flags to change functionality based on request or load balancing multiple systems with different functionality, akin to canary releasing.

This might seem intimidating at first glance, and it might be even harder to get buy-in for a group behind the curve on modern deployment strategies. The difficulty comes in not only implementing these features, but also the buy-in of the culture of continuous delivery. Given these two potential issues, it’s generally worth looking into purchasing a solution that can help the technical implementation, such as using CloudBees Feature Flags.

How to Test in Production

Finally, after reading the pros and cons of testing in production, let’s go over how to start applying the approach in your software development process. Getting started with testing in production really just revolves around two things:

  • Frequent deployment to the production environment

  • Granular activation and release of features to a growing number of users

Let’s go over the major factors in doing each of those.

Frequent Deployments With Feature Flags

The first major step in starting to test in production is more frequent and smaller deployments. These deployments should come alongside feature flags that control activation in the environment.


Something to consider on deployment frequency: feature flags allow for deploying of software with incomplete functionality. This way, you can deploy out without needing to worry about the functionality being completely present.

Granular Feature Flag Activation

Finally, we’re at the actionable part: actually testing in production. To do this, we start by using the feature flags deployed out alongside the functionality in those deployments. We can not only just turn features on and off with those feature flags, but we can also control the user base with access to said features.

As a first example, consider deploying a brand new feature. Using a feature flag, we deploy into production with functionality turned off to all users. Once we’re ready, turn on the feature flag to a subset of users that relate to an internal QA team. This team can perform manual testing to verify everything is in place. Once that’s done and we’re confident everything is working, we turn on the feature flag to all users. Since the internal testing was done in the same environment as what the users are using, our confidence that the users will have working functionality is improved.

As another example, we will consider the example of releasing a new UI, which replaces a workflow for something, such as filling out an application. We release the functionality using a feature flag but do so with a 20/80 split, randomly giving 20 percent of the users the new UI. We can record the data of user satisfaction (surveys, speed in completion and others) and change the ratio appropriately, depending on user reception. Assuming users respond well, we’ll transition to the new UI completely. Another possibility is the UI isn’t received well. This outcome is unfortunate, but that means we just turn the feature flag to convert back to the old UI and go back to the drawing board. At first glance, this may seem like a waste of time, but consider that the poor response came early in the release of said UI, as opposed to further down the road. Having the poor response appear after a full release might result in worse consequences, such as lost customer confidence and difficulty in reverting back to a better solution.

Getting an Edge in Feature Delivery by Testing in Production

Although it may seem like a lot to take in, moving to a testing in production workflow will give your team the edge in releasing stable software and delivering functionality to customers quicker than your competition. If you’re looking to test in production but feel your team has a lot of ground to cover, consider using CloudBees Feature Flags to quickly get up to speed.

Dave Farinelli is a senior software engineer with over eight years of experience. His specialty is in providing enterprise-level solutions for healthcare and insurance clients. Dave holds a B.S. in computer engineering from Kettering University in Flint, Michigan.

Stay up to date

We'll never share your email address and you can opt out at any time, we promise.

Loading form...
Your ad blocker may be blocking functionality on this page. Please disable for an improved experience.