RESEARCH · JUNE 12, 2026 · 6 MIN READ
Nearly Half of Agent PRs Get Rejected: What the Data Says and Why
New benchmark data puts agent PR merge rates in the 60% range. Here is what drives those rejections, and why autonomous review has become a cost line.
Nearly Half of Agent PRs Get Rejected: What the Data Says and Why
Roughly four in ten pull requests submitted with heavy agentic involvement are rejected before they merge. That is not a rounding error or a pilot-program artifact. It is a structural cost that compounds every sprint, on every team, at every adoption level. The review cycles, CI runs, and human attention absorbed by rejected PRs are the hidden price of agent-speed generation without agent-quality governance.
The Numbers#
Jellyfish's production benchmark, covering roughly 250,000 developers and 40 million data points across approximately 1,000 enterprise companies, arrives at a clear picture: merge rates on AI-assisted PRs have dropped from ~80% to ~60% as agents have taken on a larger share of total diffs. Jellyfish reported a 260% year-over-year increase in AI coding usage from June 2024 to May 2026, and the share of PRs with high AI involvement jumped from 14% to 51% in roughly 12 months.
The headline: roughly four in ten agent-heavy PRs do not survive review.
Why Agent PRs Fail#
The failure modes are well-documented at this point. AI policy desk's review catalogues six recurring patterns: hallucinated APIs where the function name looks real but the package does not exist; plausible-but-wrong logic that passes a casual read and fails on edge cases; decorative security checks that are present but bypassable; tests that assert what the code does rather than whether it should; scope creep where the agent refactors unrelated files; and confident comments describing intended behavior while the code does something else.
The hallucinated API problem deserves particular attention. An agent working without access to a project's actual dependency manifest will invent plausible method signatures. The code compiles against a stub, passes a surface review, and breaks at runtime. This is not a model quality failure in any simple sense. It is a context failure.
Context Debt Is the Root Cause#
The emerging consensus among teams that have improved merge rates is that instruction files , CLAUDE.md, AGENTS.md, or equivalent repo-level configuration , materially reduce rejection rates. The mechanism is straightforward: agents given explicit context about package managers, test runners, security-sensitive paths, and architectural constraints make fewer assumptions. Fewer assumptions means fewer hallucinated dependencies, fewer tests that only pass by mirroring the code, and fewer scope-creep refactors nobody asked for.
Instruction files do not fix model quality. They reduce context debt. That distinction matters because context debt is a fixable repo property, while model quality is not under a team's control.
What the Drop Costs in Practice#
Jellyfish's data shows that teams using agentic tools are shipping approximately 2x as many pull requests as before adoption. Pair that with a merge rate that has fallen 20 points and the math becomes uncomfortable. A team that generated 100 PRs per month before agents is now generating ~200. Roughly 80 of those will not merge. Before agents, perhaps 20 failed to merge. The review capacity required to process the additional rejected PRs is not free. CI runs on closed PRs are not free. Reviewer attention absorbed by a rejected diff is attention not spent on the diffs that do ship.
Bug-related PRs and rollback rates have stayed relatively flat, which is the good news. The quality that reaches production is holding. The waste is upstream of production, in the review layer, and it is growing faster than most teams' review capacity.
Why Autonomous Review Is Now a Budget Line#
This is the context in which autonomous code review stops being a productivity convenience and starts being a financial argument. Every rejected PR costs real resources. If a meaningful share of those rejections are for predictable, catchable reasons , hallucinated dependencies, missing behavioral tests, security checks that are decorative, context mismatches , then catching them before they enter the human review queue has a measurable dollar value.
Hyrax reviews every diff across multiple domains , security, code quality, reliability, API and data, ops, UX , and runs verification before proposing a fix PR. The human merges. That review happens on all code, not only agent-authored code, which matters because the failure modes above appear in human-authored diffs too. But the dropping merge rate on agent PRs makes the case quantitatively. At that rate, the cost of not reviewing before merge is no longer theoretical.
The post on five AI-code failures your CI does not catch details specific failure patterns , decorative assertions, hallucinated imports, behavioral gaps , that standard CI pipelines miss precisely because they are syntactically valid. Those patterns map directly onto the rejection reasons surfaced in the benchmark data.
What Changes If Instruction Files Become Standard#
The instruction-file finding has an architectural implication worth naming. If CLAUDE.md or AGENTS.md reduces agent rejection rates by encoding repo context explicitly, then the codebase itself becomes part of the governance layer. Architecture decisions, dependency constraints, security-sensitive paths, test requirements: all of these can be committed as agent-readable files rather than left to be inferred from the codebase or passed in prompts.
This is not a novel idea. It is a precise formalization of what senior engineers do during code review: they enforce context the agent never had. Making that context explicit, versioned, and checked into the repository shifts governance upstream, before generation rather than after it. The merge-rate drop does not have to be permanent.
Hyrax is live at hyrax.dev.