CloudBees Smart Tests: How AI Selects the Right Tests to Run

Not every test matters for every code change.

Running all of them anyway drives up CI costs and slows feedback loops to hours. It also fills build reports with flaky noise, turns triage into a manual slog, and erodes developer trust in the build signal.

AI-driven test selection is how CloudBees Smart Tests fixes this.

How Smart Tests Selects Which Tests to Run

Smart Tests uses GenAI-based semantic analysis to select only the tests relevant to each change, comparing what changed in the code against what each test exercises and running the tests most likely to catch a real failure. The result is less compute waste and faster feedback on every PR. CloudBees customers report a 35–70% reduction in CI runtime on their own pipeline data.

Selection is available from the very first run. No training period, no pipeline data accumulation required.

Semantic Similarity, Not Statistical Pattern-Matching

Most test selection approaches build statistical models from historical pass/fail patterns, which means they need weeks of pipeline data before they can make reliable selections.

Smart Tests uses GenAI-based semantic similarity. It reads the code itself: what the changed code does, and what each test exercises. For example, a change to discount logic triggers pricing and checkout tests. Authentication tests don’t run. No historical data needed, no prior failures required.

This is why selection is available immediately. The system doesn’t need to observe your pipeline over time. It understands the relationship between code and tests from the start.

Predictive Test Selection

Once Smart Tests is integrated, every commit triggers semantic analysis. The changed files are compared against the full test suite, tests are ranked by relevance, and only the highest-scoring subset runs.

The model also prioritizes tests by failure probability within that selected subset, so the tests most likely to surface a real issue run first, and failures appear earlier without waiting through the full suite.

Smart Tests applies predictive test selection in two ways, and most teams end up using both:

Test Faster: Runs only the tests relevant to the specific change. CloudBees customers report a 35–70% reduction in CI runtime on their own pipeline data.
Test Earlier: Moves validation earlier in the pipeline. Instead of finding failures in the nightly build, developers get feedback on their PR the same day, while the context is still fresh.

CloudBees Smart Tests doesn’t skip or fake tests. It runs the subset most likely to detect failures for that specific change.

The Confidence Curve Shows How Much Testing Each Change Needs

Smart Tests includes a confidence curve that shows exactly how much testing each change needs to catch real failures, built from your actual pipeline data instead of a generic benchmark.

It maps the relationship between runtime and failure coverage. Running the full suite for 100 minutes detects 100% of historical failures. Running for 40 minutes may detect 95% of them.

This turns running the full suite by default into a deliberate choice between speed and coverage. Teams set a confidence threshold based on where they are in the pipeline. A PR at the dev stage might run at 80% confidence for faster feedback. Later pipeline stages run at higher thresholds. The nightly build still runs the full suite.

The confidence curve makes the cost of running everything visible, so teams can decide when the full suite is worth running and when a targeted subset is enough.

Failure Grouping Cuts Dozens of Red Tests to Root Causes

While Smart Tests’ predictive test selection reduces runtime, failure grouping reduces triage time.

A failing build typically surfaces dozens of individual failures with no clear starting point. A developer opens the report and reads through them one by one, looking for the pattern.

CloudBees Smart Tests does that work automatically: grouping failures by shared root cause, prioritizing the most impactful ones first, linking each group to the commit that triggered it, and notifying the right developer.

Groups Failures by Root Cause

Smart Tests clusters failures that share a common cause. A network timeout affecting 30 test cases appears as one issue rather than 30 separate failures to investigate.

AI-driven test intelligence then ranks these issues by the number of tests they affect, so developers know where to start.

Traces Each Group to Its Commit

Each failure group links directly to the commit that most likely introduced the issue: Smart Tests correlates failure timing with recent commits in the repository. Developers get a precise starting point without having to dig through version history.

Notifies the Right Developer Immediately

The developer whose commit triggered the failure gets a Slack notification at the time of the failure. Jira surfaces the ticket with full context attached. Decisions that used to wait until the next morning’s standup happen at the time of the commit.

Smart Tests cuts manual triage from hours to minutes, getting the right developer working on the right fix immediately.

AI Deprioritizes Flaky Tests From Your Build Signal

CloudBees Smart Tests tracks each test’s behavior over time and automatically deprioritizes the ones that fail without a corresponding code change.

Flaky tests will always exist in enterprise development. The question is whether your team spends hours chasing them. When developers can’t tell whether a red build means a real problem or just another flaky test, they rerun pipelines out of habit. Red stops meaning anything.

Smart Tests surfaces which tests are flaky, how often they fail, and whether the rate is improving or getting worse, so teams have the visibility to address the underlying issue. The model deprioritizes those tests in your results automatically. They remain visible but stop polluting the signal.

A red build means a real failure. A green build means the change is clean.

Beyond individual builds, the reporting layer tracks failure ratios, flaky test rates, and long-running tests, giving QA leads ongoing visibility into overall test suite health.

From Running Everything to Running What Matters

Smart Tests integrates with CI systems, including Jenkins®, GitHub Actions, and GitLab, as well as test frameworks such as Playwright, Selenium, Cucumber, and any JUnit-producing system. You don’t need to migrate anything or add new tooling.

Every change now runs only the tests CloudBees Smart Tests predicts will catch a real failure. The result is faster feedback, lower compute costs, and a more reliable CI signal.

Inside Smart Tests: How AI Predicts the Tests That Matter