Clean code in. Clean PRs out. Hyrax audits your codebase, surfaces findings across security, correctness, maintainability, performance, architecture, and operations, then writes fixes and opens PRs. You review and merge every PR.

How does pricing work?

One flat price per workspace, not per seat. Free: 1 private repo, 1 mini-audit per month, verified fixes, no card. Pro: $30/mo with $30 of usage included, up to 3 repos, full audit pipeline, opt-in overage after. Team: $200/mo flat with $200 of usage included, unlimited repos and seats. Included usage does not roll over.

What do I get with Pro vs Team?

Pro: up to 3 repos, the full audit + scan pipeline, PR reviews on every opened PR, and auto-publish to Linear. Team: unlimited repos and seats, plus the self-improvement learn loop and public repos by URL.

What is the 13-step verification?

Every fix runs through 13 steps before a PR opens: test baseline, fix agent, diff size guard, test regression, build, auto-format, lint, cross-project test, scanner quality loop, review loop, post-fix audit, detection query verify, push and PR. A failure at any critical step aborts the run.

How is Hyrax different from Copilot or Cursor?

Copilot and Cursor help you write code faster. Hyrax ships clean code. It audits issues, fixes them, and opens PRs for you to review and merge. Different category, different outcome.

Scan profiles your entire codebase, your architecture, conventions, patterns, and creates an Agent Context stored in your .hyrax/ folder. Then it runs six agent groups plus a deterministic scanner. Scan produces findings and easy wins, each with a change plan ready for Fix.

Every change ships as a pull request with the [Hyrax] prefix. PR Review reviews every opened pull request automatically against your codebase conventions, leaving comments that update as your code changes. It can block merge on must-fix findings. Available on Pro and Team.

What languages are supported?

Hyrax works across 19 languages: Python, TypeScript, JavaScript, Go, Rust, Swift, Ruby, Java, Kotlin, C#, C++, C, PHP, Scala, Dart, Elixir, Shell, Lua, and MDX. It works with the frameworks built on them — React, Next.js, Vue, Svelte, Angular, Node.js, Django, Rails, Spring, FastAPI, Express, React Native, and Flutter.

What integrations are supported?

GitHub for source control. Linear for ticket management. Tickets are created on audit and closed automatically when fixes merge.

All inference runs in our AWS Bedrock account. We do not train on your code.

RESEARCH · JUNE 12, 2026 · 6 MIN READ

Nearly Half of Agent PRs Get Rejected: What the Data Says and Why

New benchmark data puts agent PR merge rates in the 60% range. Here is what drives those rejections, and why autonomous review has become a cost line.

Nearly Half of Agent PRs Get Rejected: What the Data Says and Why

Roughly four in ten pull requests submitted with heavy agentic involvement are rejected before they merge. That is not a rounding error or a pilot-program artifact. It is a structural cost that compounds every sprint, on every team, at every adoption level. The review cycles, CI runs, and human attention absorbed by rejected PRs are the hidden price of agent-speed generation without agent-quality governance.

The Numbers#

Jellyfish's production benchmark, covering roughly 250,000 developers and 40 million data points across approximately 1,000 enterprise companies, arrives at a clear picture: merge rates on AI-assisted PRs have dropped from ~80% to ~60% as agents have taken on a larger share of total diffs. Jellyfish reported a 260% year-over-year increase in AI coding usage from June 2024 to May 2026, and the share of PRs with high AI involvement jumped from 14% to 51% in roughly 12 months.

The headline: roughly four in ten agent-heavy PRs do not survive review.

Why Agent PRs Fail#

The failure modes are well-documented at this point. AI policy desk's review catalogues six recurring patterns: hallucinated APIs where the function name looks real but the package does not exist; plausible-but-wrong logic that passes a casual read and fails on edge cases; decorative security checks that are present but bypassable; tests that assert what the code does rather than whether it should; scope creep where the agent refactors unrelated files; and confident comments describing intended behavior while the code does something else.

The hallucinated API problem deserves particular attention. An agent working without access to a project's actual dependency manifest will invent plausible method signatures. The code compiles against a stub, passes a surface review, and breaks at runtime. This is not a model quality failure in any simple sense. It is a context failure.

Context Debt Is the Root Cause#

The emerging consensus among teams that have improved merge rates is that instruction files , CLAUDE.md, AGENTS.md, or equivalent repo-level configuration , materially reduce rejection rates. The mechanism is straightforward: agents given explicit context about package managers, test runners, security-sensitive paths, and architectural constraints make fewer assumptions. Fewer assumptions means fewer hallucinated dependencies, fewer tests that only pass by mirroring the code, and fewer scope-creep refactors nobody asked for.

Instruction files do not fix model quality. They reduce context debt. That distinction matters because context debt is a fixable repo property, while model quality is not under a team's control.

What the Drop Costs in Practice#

Jellyfish's data shows that teams using agentic tools are shipping approximately 2x as many pull requests as before adoption. Pair that with a merge rate that has fallen 20 points and the math becomes uncomfortable. A team that generated 100 PRs per month before agents is now generating ~200. Roughly 80 of those will not merge. Before agents, perhaps 20 failed to merge. The review capacity required to process the additional rejected PRs is not free. CI runs on closed PRs are not free. Reviewer attention absorbed by a rejected diff is attention not spent on the diffs that do ship.

Bug-related PRs and rollback rates have stayed relatively flat, which is the good news. The quality that reaches production is holding. The waste is upstream of production, in the review layer, and it is growing faster than most teams' review capacity.

Why Autonomous Review Is Now a Budget Line#

This is the context in which autonomous code review stops being a productivity convenience and starts being a financial argument. Every rejected PR costs real resources. If a meaningful share of those rejections are for predictable, catchable reasons , hallucinated dependencies, missing behavioral tests, security checks that are decorative, context mismatches , then catching them before they enter the human review queue has a measurable dollar value.

Hyrax reviews every diff across multiple domains , security, code quality, reliability, API and data, ops, UX , and runs verification before proposing a fix PR. The human merges. That review happens on all code, not only agent-authored code, which matters because the failure modes above appear in human-authored diffs too. But the dropping merge rate on agent PRs makes the case quantitatively. At that rate, the cost of not reviewing before merge is no longer theoretical.

The post on five AI-code failures your CI does not catch details specific failure patterns , decorative assertions, hallucinated imports, behavioral gaps , that standard CI pipelines miss precisely because they are syntactically valid. Those patterns map directly onto the rejection reasons surfaced in the benchmark data.

What Changes If Instruction Files Become Standard#

The instruction-file finding has an architectural implication worth naming. If CLAUDE.md or AGENTS.md reduces agent rejection rates by encoding repo context explicitly, then the codebase itself becomes part of the governance layer. Architecture decisions, dependency constraints, security-sensitive paths, test requirements: all of these can be committed as agent-readable files rather than left to be inferred from the codebase or passed in prompts.

This is not a novel idea. It is a precise formalization of what senior engineers do during code review: they enforce context the agent never had. Making that context explicit, versioned, and checked into the repository shifts governance upstream, before generation rather than after it. The merge-rate drop does not have to be permanent.

Hyrax is live at hyrax.dev.

Nearly Half of Agent PRs Get Rejected: What the Data Says and Why

The Numbers#

Why Agent PRs Fail#

Context Debt Is the Root Cause#

What the Drop Costs in Practice#

Why Autonomous Review Is Now a Budget Line#

What Changes If Instruction Files Become Standard#

Sources