Clean code in. Clean PRs out. Hyrax audits the codebase, surfaces findings across security, correctness, maintainability, performance, architecture, and operations, then writes fixes and opens PRs. You review and merge every PR.

How does pricing work?

One flat price per workspace, not per seat. Free: 1 private repo, 1 mini-audit per month, verified fixes, no card. Pro: $30/mo with $30 of usage included, up to 3 repos, full audit pipeline, opt-in overage after. Team: $200/mo flat with $200 of usage included, unlimited repos and seats. Included usage does not roll over.

What do I get with Pro vs Team?

Pro: up to 3 repos, the full audit + scan pipeline, PR reviews on every opened PR, and auto-publish to Linear. Team: unlimited repos and seats, plus the self-improvement learn loop and public repos by URL.

What is the 13-step verification?

Every fix runs through 13 steps before a PR opens: test baseline, fix agent, diff size guard, test regression, build, auto-format, lint, cross-project test, scanner quality loop, review loop, post-fix audit, detection query verify, push and PR. A failure at any critical step aborts the run.

How is Hyrax different from Copilot or Cursor?

Copilot and Cursor help you write code faster. Hyrax ships clean code. It audits issues, fixes them, and opens PRs for you to review and merge. Different category, different outcome.

Scan profiles the entire codebase — architecture, conventions, patterns — and creates an Agent Context stored in the .hyrax/ folder. Then it runs six agent groups plus a deterministic scanner. Scan produces findings and easy wins, each with a change plan ready for Fix.

Every change ships as a pull request with the [Hyrax] prefix. PR Review reviews every opened pull request automatically against the codebase conventions, leaving comments that update as the code changes. It can block merge on must-fix findings. Available on Pro and Team.

What languages are supported?

Hyrax works across 19 languages: Python, TypeScript, JavaScript, Go, Rust, Swift, Ruby, Java, Kotlin, C#, C++, C, PHP, Scala, Dart, Elixir, Shell, Lua, and MDX. It works with the frameworks built on them — React, Next.js, Vue, Svelte, Angular, Node.js, Django, Rails, Spring, FastAPI, Express, React Native, and Flutter.

What integrations are supported?

GitHub for source control. Linear for ticket management. Tickets are created on audit and closed automatically when fixes merge.

All inference runs in the Hyrax AWS Bedrock account. Hyrax does not train on customer code.

RESEARCH · JUNE 14, 2026 · 5 MIN READ

Code Writes Itself. Review Doesn't. The MIT/Wharton Numbers.

A May 2026 NBER study of 100,000+ developers found AI coding agents produced 741% more code but only 20% more releases, confirming review as the binding constraint.

Code Writes Itself. Review Doesn't. The MIT/Wharton Numbers.

A May 2026 NBER working paper by Mert Demirer, Leon Musolff, and Liyuan Yang studied more than 100,000 GitHub developers across three generations of AI coding tools. The central finding: autonomous agents produced 741% more lines of code, while actual software releases rose 20%. The bottleneck was never writing code. It has always been everything that happens after.

What the Study Measured#

The paper, NBER Working Paper 35275, traced developer output at every stage of the production chain, from lines written to commits, pull requests, projects, and finally releases. Each generation of tooling showed the same compression pattern. Autocomplete raised lines of code by 228%, but releases by only 10%. Sync agents pushed lines of code to a 741% increase, pull requests up 65%, releases up 20%. Async agents drove lines of code up roughly 1,700%, with releases reaching only 30%.

The gain shrinks at every downstream step. The authors call this the "weak-link hypothesis": accelerating one stage moves the constraint to the next human-dependent stage, it does not remove it.

The researchers also checked four major app marketplaces. New app creation rose sharply. Total user downloads across those same cohorts: flat. The share of new apps failing to reach a minimal audience climbed from about 79% to 86%. More supply, no demand response.

The Elasticity Number That Matters#

The authors estimated an elasticity of substitution of 0.25 between AI-generated output and human review effort. Below 1.0 means the inputs are complements. A factor of 0.25 means they are strongly so. Doubling AI code output does not halve the review burden. It increases demand on review while supply of human reviewer hours stays constant.

That number is the reason 65% more pull requests did not produce 65% more releases. The queue grew faster than the queue could be cleared.

An earlier METR trial found developers using AI coding tools believed they finished work 20% faster when actual completion times ran 19% longer. That perception gap is part of why teams undercount how much of their productivity gain has been absorbed by review, integration, and release overhead rather than captured as shipped software.

Where the Cost Moved#

For roughly a decade, the binding constraint in software development was writing code. AI removed that constraint quickly and at scale. The constraint did not disappear. It moved.

Integrating changes across a codebase that was modified faster than any human reviewed it, managing 65% more pull requests with the same reviewer headcount, handling security patching and dependency hygiene on a volume of code that previously did not exist , none of that got faster because the diff arrived sooner. The expensive part of software development relocated downstream, and most tooling investment has not followed it there.

Anthropic disclosed in June 2026 that more than 80% of its production code merged in May was authored by Claude, producing an 8x increase in code volume per engineer per quarter compared to its 2021–2025 baseline. That is the direction the numbers are moving industrywide.

The Review Problem Is Mostly Mechanical#

The Demirer, Musolff, and Yang paper does not prescribe a solution. It does identify where the attenuation happens: review, integration, security, and release management are the stages where human judgment currently sets the ceiling on throughput.

Not all of that judgment is irreplaceable. Style consistency, known vulnerability patterns, dependency hygiene, API contract compliance, regression risk on changed paths , these are deterministic enough to be checked without a human in the loop. They consume a substantial portion of review time. Automating them does not replace the judgment that matters. It clears the queue so judgment can be applied to the 20% of review that actually requires it: architecture decisions, cross-system effects, and correctness in novel territory.

That is the division of labor the NBER data points toward. The paper's own framing , complements, not substitutes , describes it precisely. More AI-generated code upstream requires more review capacity downstream, and the only path to closing the 741%-to-20% gap is making review faster without making it shallower.

What Autonomous Review Changes#

Hyrax runs across six domains , security, code quality, reliability, API and data contracts, ops, and UX , and executes 13 verification steps before submitting a PR with fixes. It handles the mechanical fraction of review at machine speed: the findings that do not require judgment, the corrections that follow deterministic rules, the patterns that repeat across codebases. The human reviewer sees a smaller queue, a verified fix already attached to each finding, and their attention reserved for decisions that actually warrant it.

The study's weak-link framing is accurate. The answer is not more human reviewers. The review chain has a mechanical segment and a judgment segment. Mechanizing the first one is the only way to stop the second one from becoming a permanent ceiling on what ships.

Hyrax is live at hyrax.dev.

Code Writes Itself. Review Doesn't. The MIT/Wharton Numbers.

What the Study Measured#

The Elasticity Number That Matters#

Where the Cost Moved#

The Review Problem Is Mostly Mechanical#

What Autonomous Review Changes#

Sources