Agentic DevOps World 2026: Key Takeaways

AI is generating code faster than most enterprises can govern, test, or ship it. That's the central finding of our new research report, the State of Code Abundance 2026, based on a survey of more than 200 enterprise technology leaders.

To dig into what the data means in practice, we brought together senior engineering and technology leaders for Agentic DevOps World, a virtual summit that ran alongside the report launch.

The summit featured contributions from Anuj Kapur, Shawn Ahmed, Loreli Cadapan, Anthony Aquilio, Avan Mathur, and Johanie Marcoux (CloudBees), alongside Gerard McMahon (Fidelity Investments), Phil Nash (IBM), Madison Mills (Axios), and Mike Vizard (Techstrong Group).

Here's what speakers and panelists had to say about the challenges they’re encountering and how they’re tackling them.

Confidence Is Outpacing Reality

Most enterprise leaders believe they have AI under control. Nearly all of them (92%) express confidence in the production readiness of AI-generated code before it ships. Yet 81% have seen production issues increase linked to that same code.

CloudBees CEO Anuj Kapur says this paradox comes from executives conflating movement with progress.

"When the scoreboard doesn't exist, everyone feels like they're winning," he says.

On the ROI side, organizations can only attribute 31% of their AI-related spend to specific business outcomes, and 36% track AI spend without measuring ROI at all.

"We're probably two quarters away from both investors and boards saying, ‘Well, you spent all this, and what did I get for it?’" Anuj says.

Shawn Ahmed, CloudBees CPO, said that most organizations are measuring the wrong thing entirely, optimizing for token consumption rather than outcomes. Most are operating, as he put it, "in the fog of war, driven by the fear of missing out."

The pressure to ship AI-generated code is outrunning the ability to measure what it's actually delivering or to manage its impact on production, even as the process of writing that code has never been faster.

The Bottleneck Has Moved Downstream

AI now generates or assists in writing 61% of the average enterprise codebase, and 52% of organizations report a significant increase in software development output as a result. Agentic development—where AI agents write, iterate on, and increasingly manage code with minimal human intervention—has made generation faster than ever.

Shawn asks the question on everyone's mind: "If we've removed the constraint from one side, where has it gone?"

"When code is infinite, we can build anything. It's exhilarating," says Phil Nash, Developer Relations Engineer at IBM.

But as Phil puts it, "The marginal cost of a line of code was never just the time it took to write the code. It is the total of the planning, the analysis, the design, the coding, the review, the deployment, and the maintenance after it."

According to the survey results, more leaders now identify post-code stages (review, testing, deployment) as their primary bottleneck, with 70% saying that maintaining their test suite is now a greater burden than writing code.

Loreli Cadapan, CloudBees VP of Product, makes the point that the question in code review has fundamentally changed.

"The question is no longer, ‘Is this good code?’ It's starting to shift more into the question of, ‘Do we actually understand what this code is doing?’"

When AI writes the code and a human merges it, the line of ownership blurs. When the code review question changes, so does the accountability model. According to the survey, just 12% of organizations have a dedicated AI governance team, and when something breaks, 46% say responsibility defaults to the CTO or VP of Engineering.

"The burden is now on the engineering or the security group to ensure that the right policies are set in place," Loreli said.

CloudBees VP of Engineering, Anthony Aquilio, explains what that requires in practice.

"My job is really to work with my teams to understand what those rules are, what the governance is, and then try to build the checks and balances in place as much as we can."

One tempting shortcut is to let AI verify the code it wrote, but an LLM has no reliable way to know whether that code actually behaves correctly in production.

"If you just blindly trust AI to manage all the testing for you, I've seen firsthand where that can get you into trouble; where everything is green, it looks great, but then you're getting feedback from your users that things aren't so great," Anthony adds.

The validation problem is one dimension of the challenge; adoption is another.

When Fidelity Investments, one of the world's largest financial institutions, rolled out AI coding tools broadly, the initial response was enthusiastic. And not just among engineers; non-technical staff wanted in too.

But according to Gerard McMahon, Head of ALM Tools and Platforms, that enthusiasm didn't last.

"There was a very initial peak of high usage, and then it tracked back over. We were seeing that drop off across 12-14,000 people," he says.

The tools were powerful, but engineers couldn't make them work in their actual day-to-day.

"The initial playing and the initial satisfaction from experimenting; they were unable to translate that into their work and into the products and applications that they support," Gerard says.

Successful AI Adoption Needs a Foundation

Fidelity Investments' response to their adoption dilemma was to invest heavily in upskilling and education, putting 20,000 engineers and developers through a continuous program of workshops, hackathons, and office hours.

And it worked: Usage climbed back up considerably.

But as engineers became more proficient with the tools, the abundance of AI-generated code began to create the bottleneck mentioned earlier.

"While code was getting created at a quicker pace, everything else behind that was actually slowing down," Gerard says.

"If you get a productivity increase on the left side of the pull request—10, 15, 20%—are you seeing relatively the same productivity increases on the right side? We were not," he added. Senior engineers absorbed a disproportionate share of the review load, iterations multiplied, and productivity gains turned negative.

According to our survey, most organizations aren't measuring carefully enough to catch this: 54% track AI productivity primarily by time saved, a metric that tells you how busy your teams are, not whether they're delivering value.

Fidelity's response was to measure across the full SDLC rather than just at the point of generation. They introduced four metrics: pull requests, artifacts created through CI builds, artifacts promoted as production candidates, and production deployments.

“These four measures will allow us to see if there's churn or friction within the system, but also, we can see if that value is being carried through," Gerard says.

Measuring across the pipeline can show you where the friction is, but building the right tooling is what removes it. IBM’s Phil Nash shared several examples of teams doing both.

At 1Password, engineers used agents to build a program that analyzed the structure of their monolithic codebase, then used that analysis to execute more than 3,000 pattern migrations.

"They spent a lot of time upfront planning the refactor, building tools to help them with it, and then letting the agent do the work. That planning meant they had the confidence to move the changes through the rest of the pipeline," Phil explains.

At Intercom, a PR review agent with clear guidelines and size limits auto-approved 19.2% of pull requests, taking a fifth of the review backlog off human plates entirely.

"Given the size limits, it encouraged developers at Intercom to start submitting smaller, more incremental, well-scoped changes, which is funny, because that's something we've been trying to get developers to do for years," Phil says, before turning to IBM's own approach.

Their internal coding agent, Bob, takes that idea further, proactively flagging issues before a developer even commits.

"When you are working within Bob, there's just a little tab that can come up to say, ‘Hey, Bob's found some stuff that he thinks you should look at’; this is outside of the interactions and collaboration that you're having with it. It's really useful."

Investing in the tooling around code generation is emerging as the clear path forward, and the survey data reflects it. 53% of organizations report a significant increase in testing, security, and deployment costs over the past 12 months — likely driven, in part, by teams trying to get a clearer picture of what their AI investment is actually delivering. Only 51% are very confident they can accurately measure AI productivity gains and ROI.

For teams that want to take a similar approach without building from scratch, Phil highlighted the CloudBees open-sourced DevOps Agent Kit; scaffolding that lets any AI coding agent plug into your full delivery stack, with context from CI, security scans, release workflows, and feature flags built in. It ships read-only by default, with every write action requiring explicit human confirmation, so governance is built in from day one rather than bolted on later.

Now Is the Time to Get the Infrastructure Right

Both the survey data and the real-world stories from practitioners confirm that AI-generated code is here to stay, and that the risks are just as real as the productivity gains.

For leaders navigating the pressure from boards and investors, CloudBees CEO Anuj Kapur advises "creating room for experimentation, room for failure, and really room for people to redefine what they do, how they do it, and what the business outcomes are they can create for end customers."

That starts with being honest about what the investment is actually delivering.

"ROI is measured both in terms of return on investment as well as risk on investment," Anuj says.

The bigger point, for Anuj, is that AI has fundamentally changed the competitive landscape. The traditional advantages that large, established companies relied on are no longer as durable as they once were.

"AI redefines the starting line. It's actually never been a better time to be a disruptor."

Getting there requires the right foundation. For many teams, every tool in the delivery stack sees its own slice of the picture, but none of them can tell you whether you're actually ready to ship. CloudBees Unify AI is the control plane that closes that gap, giving agents the context they need to act safely by connecting pipeline data, governance policies, and delivery history into a single view. Without it, agents operate in silos. With it, they become trustworthy participants in your delivery pipeline.

To see every conversation from the summit in full, watch Agentic DevOps World 2026 on demand. To dive deeper into the data, download the State of Code Abundance 2026 report.

Agentic DevOps World 2026: Key Takeaways

Confidence Is Outpacing Reality

The Bottleneck Has Moved Downstream

Successful AI Adoption Needs a Foundation

Now Is the Time to Get the Infrastructure Right

Read next

Newsletter