INDUSTRY · JUNE 15, 2026 · 5 MIN READ

Reviewer Attention Is the Bottleneck, Not the Tooling

AI coding made code generation cheap. It did not make understanding cheap. The review queue is an attention-allocation problem, and tooling alone won't fix it.


Reviewer Attention Is the Bottleneck, Not the Tooling

The review queue is not a tooling problem. It is an attention-allocation problem. Several independent engineering teams reached the same diagnosis in the span of two weeks in June 2026, and their findings converge on a single uncomfortable fact: AI coding assistants moved the constraint from writing to reading, and almost no engineering process has adapted.

The constraint shifted. The policies did not.#

Before AI coding assistants, the cost of writing a change and the cost of reviewing one were roughly balanced. Both were bound by an engineer's available hours. That balance is gone.

Nate Goethel at Mews described the failure mode precisely: "Writing code is cheap. AI review is cheap. Getting a human to sign off with real attention is the new constraint, and our SDLC policies haven't adapted to that." Most organizations still require comprehensive human review on every change, regardless of how many automated reviewers have already processed it. Authoring throughput climbs. The queue piles up at the one stage that hasn't moved: the human approval requirement.

Yuval Yeret frames the same problem from a Theory of Constraints angle. AI coding increased the arrival rate of pull requests faster than teams can drain the queue. The result is rubber-stamped approvals and cognitive load concentrated on the handful of engineers who actually understand the system. That is not a failure of AI coding. It is what happens when a constraint moves and the process doesn't.

Numbers from a real experiment#

Mews ran a controlled pilot. They built a bot called Moxly on Claude Code that labeled every PR across four risk tiers, then changed the process: on low-risk PRs, reviewers were explicitly permitted to skip line-by-line review and assess at the conceptual level instead.

The queue compressed. P75 time-to-first-approval dropped from 70 hours to 44 hours. The share of PRs merged within 24 hours rose from 51% to 70%. The long tail, PRs older than 72 hours with no approval, was cut roughly in half, from 24% of the queue to 11%.

The counter-signal matters more. The rate of approvals arriving within one hour of a PR being loaded, the timing signature of rubber-stamping, went down from 22% to 10%. The queue moved faster and review quality held. What the process change actually did was give reviewers permission to spend their attention where it belonged.

Verification is now the economic bottleneck#

The TypeDB team put a sharper point on why this is structural, not incidental: "Generation is cheap. We delegate this to LLMs. Verification is expensive. We delegate this to humans. Human attention is scarce, expensive, and easily fatigued." An LLM generating a complex API endpoint in seconds doesn't save time if it takes 30 minutes to verify it didn't hallucinate an edge case.

Jane Street's Yaron Minsky made the same point from a more surprising direction. After 25 years of telling people formal methods weren't worth the cost, Minsky now argues that agentic coding has rewritten the math. The firm is building a team to make mathematical proof as routine as type checking. The reasoning: when agents generate code at volume, the verification burden becomes so large that investing in machine-checkable correctness pays off in ways it never did when humans were writing everything by hand.

The underlying observation is the same across all three analyses. Understanding did not get cheaper when writing did.

What actually exhausts reviewers#

The structural diagnosis from Joren Verhoeks on Dev.to is worth stating directly: "The deepest cause of review fatigue is that we review the wrong artifact." Generated code is the cheap output of an expensive decision. Reading it line by line means spending scarce human attention on the thing that cost nothing to produce, while the intent, the interface change, the design decision, passes through buried in a 2,000-line diff.

Amdahl's Law applies here without modification. If generation throughput rises 5 to 10 times and verification throughput stays flat, the system does not get 5 to 10 times faster. It gets a queue. The queue is the review backlog, and the engineer draining it burns out. More tooling on the authoring side makes this worse, not better.

Where autonomous review fits#

The defenses available to engineering teams fall into two categories. The first is process restructuring: spec-before-code, Architecture Decision Records as standing doctrine for agents, contract tests pinning every published interface, policy-as-code encoding the judgment that used to live in a senior engineer's head. These reduce the volume of changes that require deep review and make each remaining change cheaper to assess. The Mews experiment is the clearest published evidence that this works.

The second is absorbing the mechanical review load entirely, so human attention never sees the changes that don't warrant it. This is what Hyrax does. It reads the full codebase, runs findings across six domains covering security, code quality, reliability, API and data integrity, ops, and UX, executes 13 verification steps, then submits a fix PR. The engineer reviews the PR and merges. Hyrax never auto-merges. The point is not to eliminate human judgment. It is to make sure human judgment isn't being spent on the 90% of changes that a machine can assess, verify, and fix without a human in the loop at all.

The Mews pilot showed that telling a reviewer "this is safe to skim" was worth more than expected, not because the bot was always right, but because it let each engineer build their own version of trust on top of a reliable signal. Autonomous review takes that one step further: the changes that don't deserve human eyes never reach the queue.

As Verhoeks puts it, the teams that win with agents will not be the ones with the best prompts. They will be the ones with the best boundaries, and the clearest sense of where machine judgment ends and human judgment begins. Right now, most teams have neither. The review queue is the evidence.

Hyrax is live at hyrax.dev.


Sources

  1. 01developers.mews.com
  2. 02dev.to/jverhoeks
  3. 03typedb.com
  4. 04yuvalyeret.com
  5. 05lavx.hu