INDUSTRY · JUNE 8, 2026 · 5 MIN READ

60% Ship Untested. Uber Spent It All. The Data Is In.

Tricentis surveyed 2,501 leaders in April 2026 and found 60% of organizations are shipping untested code. Uber burned its entire AI budget in four months. The numbers confirm what was anecdotal.


60% Ship Untested. Uber Spent It All. The Data Is In.

The argument that AI coding tools degrade software quality has moved from conference-talk opinion to survey data. Tricentis published its 2026 Quality Transformation Report this month, drawing on 2,501 respondents across six countries: 60% of global organizations are shipping untested code. At roughly the same moment, Uber disclosed it had consumed its entire 2026 AI budget by April and capped per-engineer spend at $1,500 per month per tool. Two separate signals, same structural problem.

What the Tricentis data actually says#

The 60% figure is striking, but the cause breakdown is more useful than the headline number. In 2025, Tricentis found similar rates of untested code deployment and attributed it mostly to accidental quality slips, cited by 40% of respondents. In 2026, organizations are admitting the shortcuts are deliberate: 32% cite leadership pressure to prioritize speed, and 30% say the volume of AI-generated code has become too large to test fully.

That second driver is the one to watch. It is not a process failure or a skills gap. It is an arithmetic problem. Generation throughput has grown faster than review and test capacity, and organizations have responded by cutting the quality step rather than the generation step. Financial services reached 64% shipping untested code; retailers hit 63%; energy and utilities, 58%. No sector is below 50%.

The financial exposure the report attaches to this is concrete. One in five organizations is losing more than $1 million annually due to poor software quality, driven primarily by security and compliance failures (30%) and technical debt and rework (28%). Nearly half estimate losses between $500,000 and $1 million.

The Uber budget collapse in numbers#

Uber gave 5,000 engineers Claude Code in December 2025. By March 2026, 84% of engineers were classified as agentic coding users, up from 32% in February. Monthly API costs per engineer ran $500 to $2,000. The entire annual budget was gone before the second quarter started.

The company's response was a $1,500-per-employee-per-tool monthly cap, tracked through an internal dashboard, with management approval required to exceed it. That is a spending control, not a quality control. It does not change what reaches production; it changes how much generation the budget will tolerate.

COO Andrew Macdonald stated plainly at a May town hall that "it's very hard to draw a line between one of those stats and, 'Okay, now we're actually producing 25 percent more useful consumer features.'" Uber still reports that 70% of its committed code originates from AI and that 95% of engineers use AI tools monthly. The generation pipeline did not slow. The budget ceiling did.

The gap between executive confidence and engineering reality#

The Tricentis data surfaces a specific organizational failure mode. 81% of CEOs report high confidence in AI-driven systems. Only 56% of QA and DevOps professionals share that confidence. 44% of C-level executives believe their organization is well prepared to govern AI agents across the software development lifecycle; 23% of QA and DevOps professionals agree.

This is not a communication problem. It is a measurement problem. Inputs, tokens consumed, code committed, sprint velocity, are visible in dashboards. Output quality is not. The Uber case makes this explicit: Macdonald acknowledged the company needs new metrics that capture quality and business impact rather than volume, and that work is happening after the budget was already spent.

The 33% of teams in the Tricentis survey citing tool complexity and sprawl as a barrier to quality are describing the same gap from a different angle. More tools, more generation, less visibility into what any of it produces.

Why capping spend does not close the review gap#

The bottleneck has shifted. Generating code is no longer the constraint. As covered in the bottleneck moved, the cost of writing code fell to near zero; the cost of trusting it did not. A $1,500 spending cap addresses token burn. It does nothing for the review queue that forms downstream.

The Tricentis finding that 30% of organizations cite AI volume as too large to test is a statement about review capacity, not generation capacity. Reducing generation via spending caps is one response. Scaling review to match generation is another. These are not equivalent. Caps shrink output; scaled review maintains output while restoring the quality gate.

Hyrax operates on the review side of that equation. It reads the full codebase, runs fixes across six agent domains, applies a 13-step verification process, and submits the PR. Engineers merge. The review does not depend on headcount, and it does not slow when generation volume increases.

What comes next for enterprises in this position#

The structural incompatibility between consumption-priced AI tools and annual fixed budgets is the near-term procurement problem. Uber learned it early and loudly. Most enterprises are still building FY26 AI line items on assumptions that Uber's experience has already invalidated.

The quality problem is less tractable than the budget problem. Budgets can be restructured. Review capacity requires either more reviewers or automated review that keeps pace with generation. The Tricentis data, 60% shipping untested code, with the proportion who are doing it knowingly rising, suggests most organizations have chosen to defer the quality problem rather than solve it. That deference has a cost: security and compliance failures at $30 of every $100 lost to quality issues, technical debt and rework at $28.

None of this is hypothetical anymore. The data is in.

Hyrax is live at hyrax.dev.


Sources

  1. 01morningstar.com (Tricentis)
  2. 02aichatdaily.com
  3. 03businesstech.news
  4. 04techtimes.com