A/B Testing with Feature Flags

Written by: Cordny Nederkoorn

8 min read

Editor's Note: This guest blog post was written by Swaathi Kakarla, Skcript co-founder and chief technology officer. The article was first published on the CloudBees Rollout site.

Updated July 2, 2025

I test in production.

You're probably wondering why I just said that; after all, no sane developer would test in production (or deploy on Fridays!). But did you know that a lot of organizations test in production? You’ve seen it, actually -- such as when a product gets updated to a dramatically new version and you’re invited to try out the new “beta.”

If you’re a Reddit user, you may have noticed this recently when it moved to its new UI and gave you the option of continuing on the old design or trying out the new one. Over time, the new design will become stable, and it will be the only way to access the UI (unless you dig deep into the settings panel).

Tons of products do this! Only a subset of users, however, get to experience the new feature. This subset is usually selected based on things like usage time, account type, or geography, and sometimes users are allowed to sign up for beta testing voluntarily. Allowing users to try out a new UI design or a new “beta” are actually ways in which organizations test in production using feature flags.

What are feature flags?

Feature flags allow you to act as a puppeteer for your application. You get to choose what features to turn on and off - just like a flag that signals racers to start. Feature flags are simple to create and yet very useful in testing.

To create a feature flag, you just take a conditional and wrap it around a feature; then you can toggle visibility either at runtime or in response to a user attribute. This is pretty straightforward when you’ve got one or two feature flags, but it can get overwhelming as your product matures.

Feature flags are also very useful for sales, marketing, and design teams, and for product managers; for example, they can help them study how users respond to features in a silo. In order to toggle features, however, they will need to reach out to the dev team -- and that's never fun. Worse still, a developer has to manually go to the section of the codebase and toggle it, which is neither safe nor scalable.

As feature flags pile up, you will need to manage their lifecycles. It will become increasingly important to properly retire flags, and you’ll also need to ensure that toggling one flag doesn’t affect another.

As your app and feature flags grow, it will be crucial for your business to adopt a more scalable solution for feature flag management. With CloudBees Feature Management, organizations can manage feature flags with more efficiency and:

Set custom targeting rules
Gradually rollout & rollback
Perform multivariate testing & experiments
Extract audit logs
Drill down on feature analytics

To learn more about implementing feature flags and managing them in your product, read the 5 best practices for feature flags .

Testing with feature flags

In most organizations, testing happens in closed staging environments with synthesized data. This only allows you to see whether or not your feature is working; it really doesn't allow you to understand how users actually use your product.

A/B testing is a great way to figure out which features will work for your users, perform a staged rollout of big features, and iterate a feature quickly based on user feedback.

Feature flags vs A/B testing: What’s the difference?

While often used together, feature flags and A/B testing serve distinct yet complementary purposes in modern software development.

Feature flags are a deployment strategy. They allow you to turn specific features on or off for different users or environments without redeploying your code. With feature flags, you can perform gradual rollouts, hide unfinished features behind toggles, and test new functionality safely in production.

On the other hand, A/B testing is an experimentation method. It’s used to compare two or more versions of a feature or experience to determine which performs better against a predefined goal—such as increasing click-through rates or reducing bounce rates.

In practice, using feature flags enables A/B testing. You can serve different variations of a feature to targeted user segments through flags, then measure how each variation performs. This approach allows you to test real user behavior in real environments—without the need for isolated staging environments or risky full-scale deployments.

Understanding the difference between feature flags and A/B testing is essential for building a reliable and scalable testing strategy. When combined through a feature flag and experimentation platform, they create a powerful toolkit for continuous delivery, risk mitigation, and data-driven decision-making.

Why choose a feature flag and experimentation platform?

As your product grows in complexity, managing feature flags manually can quickly become unscalable. Teams need a solution that not only helps toggle features on and off, but also provides visibility, control, and insights across the entire lifecycle of experimentation. That’s where a feature flag and experimentation platform comes in.

With a robust platform like CloudBees Feature Management, you can:

Scale safely: Avoid technical debt and flag overload with tools that help you organize, group, and retire flags as your application evolves.
Enable cross-team collaboration: Empower non-developers—including product managers, designers, and marketers—to run controlled experiments without relying on engineering to flip flags manually.
Roll out confidently: Use custom targeting rules and percentage-based rollouts to gradually release features and monitor for regressions in real time.
Experiment with precision: Run multivariate tests and analyze key performance indicators (KPIs) directly within the platform to understand what works best for your users.
Maintain compliance and traceability: With audit logs, change histories, and access controls, you gain the governance needed to meet compliance requirements without slowing down innovation.

A feature flag and experimentation platform transforms experimentation from a risky bet into a continuous learning cycle—helping your teams release faster, reduce risk, and make data-driven product decisions. Whether you're deploying a new feature, testing performance changes, or evaluating UI updates, a platform like CloudBees enables you to move forward with confidence.

Performing an A/B test

So you want to do an A/B test with feature flags? Alright! Let's get to it.

Once you've decided which feature variations you want to test and set the appropriate flags for them, we can start measuring impact.

Step 1: Define user segments

First, you will need to decide which segment of your users will see the experimental features. Typically, you would select these users based on attributes such as longevity, geography, and account type. One popular division is the 90/10 split, where only 10% of your users see the beta features. If you want to hit the ground running and gather results quicker, try the 50/50 split.

Step 2: Create goals

This will help you set a framework for measuring the impact of the new feature. It could be something as simple as seeing if users spend more time on the product, or as complex as calculating whether that time has increased by a certain percent. These questions can be answered by measuring standard metrics such as:

Number of page views
Duration of session
Series of buttons/workflows navigated
Bounce rates
Exit rates
Device types

Step 3: Track goals

With CloudBees Rollout, you can view metrics on a timeline, measure progress against previous performances, and more! CloudBees Rollout also supports all three major platforms: browsers, servers, and mobile phones -- with API support in a variety of languages. It excels in mobile where you are able to hot-swap code without having to jump through the app store hurdles.

Step 4: Engage users

After deploying the feature flag and testing it with users, it's important to view the results of the experiment and act on them accordingly. Quality assurance teams (who are generally responsible for A/B testing) should share the results with customer service managers, solutions architects, and business users.

These groups have the most knowledge of how your customers use your product. They also have access to more user account data than QA engineers, which allows organizations to provide better support and build better features.

Step 5: Make changes

The goal of A/B testing is to provide insight into what works for your users with minimal risk. The data from these tests should be given to feature owners and developers so that they can make necessary changes. Once you've picked your winner, all feature flags must be safely removed. This ensures that any unused flags do not affect future tests and performance.

Benefits of A/B testing

A/B testing is a low-risk, high-reward construct for production testing. When implemented correctly, you can extract maximum value. Some benefits include:

Reduced bounce rates
Increased conversion rates
Higher value proposition
Reduced abandonment rates
Increased sales

Get started with CloudBees

A/B testing and feature flags go hand in hand. You will be able to gather user preferences and react to user feedback quickly while still delivering value. You can perform this test at any scale of the product; it doesn't require much data, but it is extremely useful. However, you should make sure to use a support tool that helps you monitor and manage feature flags, as they can quickly get out of hand.

If you want to know more about feature flagging, check out CloudBees Feature Management.

Swaathi Kakarla is the co-founder and CTO at Skcript. She enjoys talking and writing about code efficiency, performance and startups. In her free time, she finds solace in yoga, bicycling and contributing to open source.

All Blogs

CloudBees Unify Webinar Recap: The DevOps Solution That Works With Your Existing Tools

July 29, 2025

The Smoke, Sanity, and Regression Testing Triad

July 23, 2025

Seven Types of Regression Testing (and When to Use Them)

July 23, 2025

Upcoming CloudBees CI UI Changes: What You Need to Know

July 21, 2025

The Unify Buzz Intensifies

June 26, 2025

CloudBees Adds Native GitHub Actions Support

June 20, 2025

Stay up-to-date with the latest insights

Sign up today for the CloudBees newsletter and get our latest and greatest how-to’s and developer insights, product updates and company news!