CloudBees Smart Tests Helps Leading Software Company Double Release Velocity by Saving 40K Testing Hours in One Year

Two-thirds of developers lose 8 or more hours each week to inefficiencies. CI pipelines are among the biggest culprits. Test suites grow, execution times get longer, and flaky tests start failing builds for reasons that have nothing to do with the code. Most engineering teams follow the same path in response. They retry the failures, ignore the worst offenders, and parallelize what they can, but none of it reduces the number of tests being run or identifies which ones actually matter for a given code change. The pipeline keeps getting slower, and the workarounds keep stacking up.

A growing category of test intelligence tools aims to break that cycle by applying smarter logic to which tests run, when they run, and how failures get triaged. But these tools vary widely in scope, compatibility, and approach. Some cover a single language or framework. Others are tied to a specific CI system. The right one for your team depends on how your pipelines are structured and where the bottlenecks live.

What Matters When Choosing a Test Intelligence Tool

There are many test intelligence tools out there, and they don't all solve the same problem or solve it in the same way. Before comparing the different options, it helps to know what to look for. The strongest tools in this space share a few things in common.

They shorten CI feedback loops. If a tool can identify which tests are most likely to catch real failures and run those first (or skip the rest entirely), it directly reduces pipeline runtime. The key differentiators are how a tool makes that determination, whether its accuracy holds across different test types and languages, and whether it can pinpoint which tests are failing and why.

They go beyond flagging flaky tests. They surface unhealthy tests with flakiness scoring and patterns that help teams fix root causes instead of constantly retrying failed builds.

They support multiple languages and frameworks. A tool limited to one language or one test framework can only help the teams that happen to use it. Broad support means more teams benefit from the same tool, with consistent test selection and flaky test detection across every pipeline. It also matters for where your architecture is heading. Teams adopt new languages, frameworks, and services over time, and a tool that only covers today's stack becomes a constraint when the stack evolves.

They work with any CI system. Some tools are tightly coupled to a specific CI platform. This is a big constraint for enterprises running multiple CI systems across different teams or business units. CI-agnostic tools offer flexibility without forcing consolidation, and they protect your investment if your CI strategy changes down the road.

They integrate without creating lock-in. Some tools require you to adopt a broader platform before you can access test intelligence features. That means migrating CI systems, retraining teams, and taking on a long-term dependency. A tool that integrates with your existing stack delivers value without forcing that tradeoff.

Test Intelligence Tools Compared

The test intelligence space includes purpose-built tools, CI platform add-ons, and various homegrown approaches. Each varies in scope, compatibility, and how much of the problem it actually solves. Here's a look at the leading tools in the space, the approach each one takes, and where their strengths and limitations lie.

CloudBees Smart Tests

CloudBees Smart Tests is a test intelligence capability within CloudBees Unify that works across any CI system and any test framework. It combines GenAI-powered test selection, flaky test detection, prioritization, and intelligent failure diagnostics in a single tool.

Rather than requiring teams to change how they write tests or migrate to a new CI platform, Smart Tests ingests test data from existing pipelines and uses it to shorten CI feedback loops, cut compute waste, and surface the failures that actually matter.

Key strengths:

CI-agnostic and framework-agnostic, so it scales across teams regardless of their tooling.
Combines test selection, flaky test detection, prioritization, and intelligent failure diagnostics in one solution rather than requiring separate tools for each.
No migration required. Integrates with existing pipelines and test workflows.

Tricentis (Sealights)

Tricentis acquired Sealights in 2024 to add quality intelligence capabilities to its broader testing portfolio. Sealights uses a code-coverage-based approach to test selection. It maps code changes to relevant tests and flags coverage gaps. It supports multiple test types from unit to end-to-end.

Key strengths:

Deep coverage analysis that identifies exactly which code changes are tested and where gaps remain.
Integrates with the wider Tricentis suite (Tosca, qTest, Testim) for teams already in that ecosystem.

Key limitations:

Requires instrumentation agents on the codebase, which adds setup complexity.
Language support, while growing, does not extend across all frameworks and test runners.
Relies on code-coverage-based test selection, which can miss tests where the relationship between code changes and test behavior is indirect.
Test intelligence is one piece of a larger, enterprise-scale Tricentis portfolio, which may mean more platform than teams need if test selection and flaky test detection are the primary goals.

Instrumentation agents add setup complexity before teams see any value, and code-coverage-based selection can miss tests where the link between code and behavior is indirect. Teams looking specifically for test selection and flaky test management may find themselves taking on more platform than the problem requires.

Trunk

Trunk started as a merge queue service and expanded into flaky test management. Its Flaky Tests product detects, quarantines, and tracks flaky tests by analyzing test results uploaded from CI jobs. It uses AI to group related failures and surface patterns, and integrates with GitHub, Jira, and Slack to keep teams in sync.

Key strengths:

Strong flaky test detection and automated quarantining that prevents flaky failures from blocking CI pipelines.
Includes CI-agnostic data ingestion.
Integrates with GitHub, Jira, and Slack to keep flaky test ownership visible and ensure flagged tests don't get lost in a backlog.

Key limitations:

Focused on flaky test detection and quarantining. Does not offer predictive test selection or prioritization, meaning teams still run their full test suite on every change. Flaky test management addresses test reliability but not the underlying problem of pipeline runtime and compute cost.
Originated as a developer tooling company (merge queues, code quality linting), so test intelligence is an adjacent capability rather than the core product.

Trunk solves flaky test detection well, but that's where it stops. It won't reduce the number of tests you're running or tell you which ones matter most for a given change.

Gradle

Gradle offers Predictive Test Selection as part of Develocity, its broader build acceleration platform. It uses machine learning to identify and run only the tests most likely to provide useful feedback on a given code change, and includes configurable selection profiles (conservative, standard, and fast) so teams can balance speed against test confidence.

Key strengths:

Proven ML model with high prediction accuracy, used at scale by organizations like Netflix and Spring.
Part of a wider build performance suite that includes build caching and test distribution, giving JVM-focused teams a comprehensive acceleration strategy.

Key limitations:

Limited to Gradle and Maven builds, which means it only serves teams working in the JVM ecosystem.
Does not include flaky test detection, quarantining, or automated triage. Teams still need separate tooling to address test reliability alongside test selection.
Requires adoption of the Develocity platform, which may be more infrastructure than teams need if test intelligence is the primary goal.

For JVM shops, Develocity's test selection is mature and proven. Outside that ecosystem, it offers nothing, and teams still need separate tooling for flaky test detection and triage.

Datadog

According to Datadog's public documentation, Test Impact Analysis is part of Datadog Test Optimization, the company's dedicated test intelligence product. It uses code-coverage data to map tests to the files they touch, then skips tests unaffected by a given code change. Test Optimization is CI-agnostic and supports .NET, Java, JavaScript, Python, Swift, Ruby, and Go, with additional languages supported via JUnit XML.

Key strengths:

CI-agnostic. Works across CI providers without requiring teams to switch platforms.
Includes flaky test detection, prioritization, and triage workflows alongside Test Impact Analysis, giving teams a connected set of test intelligence capabilities within one product.
Natural fit for organizations already using Datadog for observability, since test data feeds into the same dashboards and monitoring workflows.

Key limitations:

Relies on code-coverage-based test selection rather than ML-driven prediction, which can miss tests where the relationship between code changes and test behavior is indirect.
Flaky test management requires manual action (deleting or ignoring flaky tests) rather than automatic quarantine, which means flaky tests can continue to disrupt pipelines until a human intervenes.
Test Optimization is part of the broader Datadog platform, so teams that don't already use Datadog take on significant platform overhead and a long-term observability commitment to access test intelligence.

Teams already using Datadog for observability get a natural extension. Everyone else is adopting an entire infrastructure monitoring stack to access a single CI capability.

Harness

It uses a static call-graph approach that maps code changes to the unit tests that exercise the affected methods and classes, then selects only those tests for execution. Language support currently covers Java, Python, Ruby, C#, Kotlin, and Scala, with JavaScript (Jest) in beta.

Key strengths:

Integrated into a full CI platform alongside caching, security scanning, and deployment, which suits teams looking for a consolidated delivery toolchain.
Provides visibility into why each test was selected, with dashboards for failure rate analysis, test duration insights, and historical trends.

Key limitations:

Test Intelligence applies to unit tests only. Other test types (integration, end-to-end, UI) require separate configuration and do not benefit from intelligent selection.
Limited to Java, Python, Ruby, C#, Kotlin, and Scala, with JavaScript (Jest) still in beta. Teams working outside those languages have no access to test intelligence features.
Requires adoption of the Harness CI platform. Teams that want test intelligence without changing their CI system cannot use it as a standalone tool.

Test Intelligence only applies to unit tests and only works inside the Harness CI platform. If you're not already on Harness, adopting test intelligence means migrating your entire CI system first.

Homegrown Approaches

When engineering teams recognize that their test suites are costing too much time and compute, the first instinct is usually to solve the problem internally rather than invest in a dedicated test intelligence tool. The two most common approaches are parallelization and automatic test retry.

Parallelization splits test suites across multiple machines to reduce wall-clock time, but it doesn't reduce the total number of tests being run, so compute costs stay the same or increase.

Automatic retry reruns failed tests on the assumption that intermittent failures will pass on a second attempt. This keeps builds green in the short term but masks flaky tests rather than surfacing them, and adds cumulative time to every pipeline run.

For smaller teams with manageable test suites, these approaches can be enough to keep things moving. But both treat symptoms without addressing root causes. Neither identifies which tests actually matter for a given code change, detects flaky tests systematically, or provides data that teams can use to prioritize fixes. At enterprise scale, that gap gets expensive. Retry cycles stack up, compute costs increase, and the pipeline gets slower with every test you take on.

Feature Comparison

Capability CloudBees Smart Tests Tricentis (SeaLights) Trunk Gradle Develocity Datadog Harness TI Predictive Test Selection Run only the tests likely to catch failures for a given code change ✓ ✓ ✗ ✓ ✓ ✓ Multi-Language Support ✓ ✓ ✓ ✗ ✓ ✓ CI-Agnostic Works with Jenkins, GitHub Actions, GitLab CI, CircleCI, and others ✓ ✓ ✓ ✓ ✓ ✗ Coverage Across Multiple Test Types Unit, integration, end-to-end, and API tests ✓ ✓ ✓ ✗ ✓ ✗ Flaky Test Detection Identify tests with inconsistent results across runs ✓ ✓ ✓ ✗ ✓ ✗ Intelligent Failure Diagnostics Surface and group failures to accelerate root cause analysis ✓ ✗ ✓ ✗ ✓ ✗ Standalone Product No requirement to adopt a broader CI or observability platform ✓ ✓ ✓ ✗ ✗ ✗ Custom / Proprietary Framework Support Works with in-house test frameworks without language-specific instrumentation ✓ ✗ ✓ ✗ ✗ ✗ Test Selection from First Run No baseline run required before selection takes effect ✓ ✗ – ✗ ✗ ✗

Why Enterprise Teams Choose CloudBees Smart Tests

Many tools in this space solve part of the test intelligence problem. Some focus on flaky test detection. Others offer test selection but only for specific languages or build tools. Beyond scope, there's also the question of access. Several are bundled into larger platforms that require teams to change their CI system before they can access test intelligence features.

CloudBees Smart Tests covers the full scope and offers test selection, flaky test detection, prioritization, and automated triage. It works across any CI system, every major language (Java, Python, JavaScript, Go, C++, .NET, Ruby, and Perl), and any test framework — including custom and proprietary ones — so teams get consistent results across every pipeline without consolidating their toolchain or rewriting tests.

In practice, a DevOps test data management platform using CloudBees Smart Tests cut regression testing time by 80%, reduced pre-commit testing time from six hours to two, and doubled its annual release velocity, without changing how tests were written or migrating to a new CI system.

For enterprise engineering organizations running multiple CI systems and languages across different teams, that combination of breadth and depth makes a big difference. Instead of stitching together point solutions or accepting the limits of a platform-locked feature, teams get a purpose-built tool that integrates with what they already have and delivers value from the first pipeline run.

Find out more about how CloudBees Smart Tests fits into your pipeline, and start reducing test cycle time.

Your CI Pipeline Deserves Better Than One-Size-Fits-All Test Intelligence

What Matters When Choosing a Test Intelligence Tool

Test Intelligence Tools Compared

CloudBees Smart Tests

Tricentis (Sealights)

Trunk

Gradle

Datadog

Harness

Homegrown Approaches

Feature Comparison

Why Enterprise Teams Choose CloudBees Smart Tests

Read next

Newsletter