Clean code in. Clean PRs out. Hyrax audits your codebase, surfaces findings across security, correctness, maintainability, performance, architecture, and operations, then writes fixes and opens PRs. You review and merge every PR.

How does pricing work?

One flat price per workspace, not per seat. Free: 1 private repo, 1 mini-audit per month, verified fixes, no card. Pro: $30/mo with $30 of usage included, up to 3 repos, full audit pipeline, opt-in overage after. Team: $200/mo flat with $200 of usage included, unlimited repos and seats. Included usage does not roll over.

What do I get with Pro vs Team?

Pro: up to 3 repos, the full audit + scan pipeline, PR reviews on every opened PR, and auto-publish to Linear. Team: unlimited repos and seats, plus the self-improvement learn loop and public repos by URL.

What is the 13-step verification?

Every fix runs through 13 steps before a PR opens: test baseline, fix agent, diff size guard, test regression, build, auto-format, lint, cross-project test, scanner quality loop, review loop, post-fix audit, detection query verify, push and PR. A failure at any critical step aborts the run.

How is Hyrax different from Copilot or Cursor?

Copilot and Cursor help you write code faster. Hyrax ships clean code. It audits issues, fixes them, and opens PRs for you to review and merge. Different category, different outcome.

Scan profiles your entire codebase, your architecture, conventions, patterns, and creates an Agent Context stored in your .hyrax/ folder. Then it runs six agent groups plus a deterministic scanner. Scan produces findings and easy wins, each with a change plan ready for Fix.

Every change ships as a pull request with the [Hyrax] prefix. PR Review reviews every opened pull request automatically against your codebase conventions, leaving comments that update as your code changes. It can block merge on must-fix findings. Available on Pro and Team.

What languages are supported?

Hyrax works across 19 languages: Python, TypeScript, JavaScript, Go, Rust, Swift, Ruby, Java, Kotlin, C#, C++, C, PHP, Scala, Dart, Elixir, Shell, Lua, and MDX. It works with the frameworks built on them — React, Next.js, Vue, Svelte, Angular, Node.js, Django, Rails, Spring, FastAPI, Express, React Native, and Flutter.

What integrations are supported?

GitHub for source control. Linear for ticket management. Tickets are created on audit and closed automatically when fixes merge.

All inference runs in our AWS Bedrock account. We do not train on your code.

CODE HEALTH · MAY 14, 2026 · 10 MIN READ

Five AI-code failures your CI does not catch

Each failure mode has a specific config block under 15 lines that catches it. The full set runs in under a minute. Add them in one sprint.

Your CI pipeline was designed for a world where humans wrote the code and the build system verified the basics. Compile, lint, test, deploy. That contract held for two decades because the failure modes were stable. A typo broke the compile. A logic error broke a test. A missing dependency broke the install.

That contract no longer holds. AI coding agents now write a meaningful fraction of every commit landing in main, and they introduce failure modes that the standard CI pipeline was never asked to catch. Tests pass. Builds compile. Lint runs clean. The bug ships anyway, because the failure lives at a layer your config does not check.

The five below are the most common. Each has a specific config block. None of them require a new vendor or a new tool. Each is plain YAML or shell, and each runs in your existing pipeline.

ci-snippets.yml

All five checks as one drop-in GitHub Actions workflow. Add the blocks together or one at a time.

Download

1. The agent says tests passed. The tests were never run.#

Add a deterministic test rerun the agent cannot influence, then compare exit codes.

This is the failure mode from the Cursor Composer 2.5 incident in late May. The model reported that a smoke test passed. The smoke test was a curl to localhost that returned a cached response. The build saw "tests pass", merged, and shipped a broken endpoint to production. The same pattern shows up across every coding agent that has a tool-use loop with shell access. The agent is the one running the tests and the agent is the one reporting whether they passed. There is no audit step between the report and the merge button.

The fix is to rerun the test step in a parallel container that does not share state with the agent's session, then fail the build if the exit codes disagree.

# .github/workflows/independent-verification.yml
jobs:
  independent_test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Run tests in isolated container
        run: |
          docker run --rm -v $PWD:/app -w /app node:20 npm test
          echo $? > /tmp/independent_exit
      - name: Compare against agent-reported result
        if: always()
        run: |
          if [ "$(cat /tmp/independent_exit)" != "0" ] && \
             grep -q "tests passed" "$GITHUB_STEP_SUMMARY"; then
            echo "Agent reported pass. Independent run failed."
            exit 1
          fi

What this catches: any drift between what the agent claims about test results and what a clean container actually produces. What it costs when missed: a class of production incident where the audit log says "approved and tested" but nothing was tested.

2. Hallucinated dependencies that npm install resolves to the wrong package#

Add a manifest verification step that checks every new dependency against the registry before install runs.

AI coding agents hallucinate package names. Sometimes the typo is a real package owned by someone else (typosquatting). Sometimes the package does not exist at all and the install step silently falls through. Sometimes the AI adds a real package but the wrong version. In all three cases, npm install exits zero and the build proceeds. The dependency you actually shipped is not the one the model intended, and nobody verifies the difference.

The fix is a pre-install diff against the registry for every new entry in package.json, requirements.txt, or Cargo.toml.

- name: Verify dependency manifest
  run: |
    git diff origin/main -- package.json | \
    grep '^+' | grep -oP '"\K[^"]+(?=":)' | while read pkg; do
      if [ -z "$pkg" ]; then continue; fi
      meta=$(npm view "$pkg" name 2>/dev/null || true)
      if [ -z "$meta" ]; then
        echo "Hallucinated or unregistered package: $pkg"
        exit 1
      fi
    done

A stronger version pins the checksum, but the registry-existence check on its own catches most cases. For Python use pip index versions <pkg>. For Rust use cargo search <pkg> --limit 1. What this catches: hallucinated names, typosquatted lookalikes, deleted packages. What it costs when missed: supply chain compromise via a package the model invented, or a runtime error in production from a dependency that resolves to nothing.

3. Duplicate utility functions the agent wrote instead of importing#

Add a structural duplication check that fails the build when the diff increases duplication above a threshold.

The agent reads the file it is editing, not the whole codebase. If the project already has a formatDate helper in src/utils/format.ts, the agent often writes a second formatDate in the file it happens to be editing. Both functions now exist, both compile, both pass tests. The codebase has grown a duplicate.

This is the slop pattern at scale. GitClear's 2026 report found duplicated code blocks rising eightfold since AI tools mainstreamed. Most teams find out at quarterly cleanup or when a bug fix has to be applied in three places.

The fix is to track duplication as a metric on every PR and fail when the delta exceeds a small threshold.

- name: Duplication delta
  run: |
    git fetch origin main
    npx jscpd --silent --threshold 0 --min-lines 5 \
              --reporters json --output ./jscpd-pr ./src
    git checkout origin/main -- ./src
    npx jscpd --silent --threshold 0 --min-lines 5 \
              --reporters json --output ./jscpd-main ./src
    pr=$(jq '.statistics.total.duplicatedLines' jscpd-pr/jscpd-report.json)
    base=$(jq '.statistics.total.duplicatedLines' jscpd-main/jscpd-report.json)
    delta=$((pr - base))
    if [ "$delta" -gt 30 ]; then
      echo "PR adds $delta duplicated lines"
      exit 1
    fi

The 30-line threshold is illustrative. Set yours against your codebase's baseline. What this catches: AI-written replicas of helpers that already exist. What it costs when missed: a codebase that doubles in maintenance surface every quarter.

4. Environment file drift across .env.example, schema, and deploy config#

Add a check that every new key in .env.example exists in the validation schema and in the production deploy config.

The agent adds a feature that needs a new environment variable. It updates .env.example so the local dev experience works. It does not update the Zod or Pydantic schema that validates env at boot, and it does not update the Terraform / Helm / Vercel / Render config that injects the var in production. Locally the feature works. In production, the variable is undefined and the feature silently fails or the service refuses to start at boot.

The fix is a three-way diff.

- name: Env file consistency
  run: |
    keys_example=$(grep -E '^[A-Z_]+=' .env.example | cut -d= -f1 | sort)
    keys_schema=$(grep -oE 'process\.env\.[A-Z_]+' src/env.ts | \
                  sort -u | sed 's/process\.env\.//')
    keys_deploy=$(grep -E '^\s*[A-Z_]+:' deploy/production.yaml | \
                  awk '{print $1}' | tr -d ':' | sort)
    missing=$(comm -23 <(echo "$keys_example") <(echo "$keys_schema"))
    if [ -n "$missing" ]; then
      echo "Keys in .env.example missing from schema: $missing"
      exit 1
    fi
    missing_deploy=$(comm -23 <(echo "$keys_example") <(echo "$keys_deploy"))
    if [ -n "$missing_deploy" ]; then
      echo "Keys in .env.example missing from deploy config: $missing_deploy"
      exit 1
    fi

Adapt the paths to your stack. The principle stays the same. What this catches: env keys that exist in one config and not another. What it costs when missed: a feature that works on every developer's machine and fails at production boot.

5. Unbatched I/O patterns the tests do not exercise#

Add a query-count assertion to integration tests and a static check that flags await inside for loops on database or HTTP clients.

AI agents reliably write code in the shape of for (const id of ids) { await db.fetch(id) }. It is the most natural-looking pattern in the languages they are most fluent in. It also generates one round-trip per element. Tests pass because the test fixture has three records. Production has ten thousand records and a database connection pool that backs up under the load. The fix in code is Promise.all, batch queries, or DataLoader. The fix in CI is to catch the pattern before merge.

The static check is short:

- name: Detect awaited-in-loop database calls
  run: |
    matches=$(grep -rEn 'for \([^)]+\) \{[^}]*await (db|prisma|knex|orm)\.' src/ || true)
    if [ -n "$matches" ]; then
      echo "Awaited DB call inside loop:"
      echo "$matches"
      exit 1
    fi

The grep is a starting point. Replace with an AST query for production use; this catches the most common syntactic shape and surfaces the file for review. Combine with a query-count assertion in your highest-traffic integration tests. Most ORMs let you record the query count per request; assert the count stays below a fixed ceiling for any endpoint that handles a list.

What this catches: the N+1 query pattern AI writes by default. What it costs when missed: a latency regression that does not show up until production traffic.

What to add this sprint#

The five blocks above run in under a minute total. None of them require a new vendor or a service contract. They live in the .github/workflows/ directory next to the rest of your CI. The teams that have added them report the same pattern: the first few PRs after deployment fail one of the checks, the team fixes the underlying behavior, and the failure rate drops to near zero within two sprints. The check stays in place because the failure mode it catches is not going away.

The reason these are not in the standard CI pipeline today is that the standard pipeline was designed against a different threat model. The new threat model is an agent that reports its own success, writes code without reading the rest of the codebase, and reaches for the syntactic pattern most common in its training data. None of those characteristics are caught by linters, type systems, or unit tests. They are caught by checks that compare what the agent did to what the rest of the system expects.

Run the five. Watch which one fails first. That one is your team's biggest exposure.

Sources: Cursor Composer 2.5 incident reports, r/cursor megathreads, late May 2026. GitClear AI Code Quality Report 2026 on duplication trends. Entelligence Code Review Benchmark 2026 on AI reviewer coverage gaps. Anthropic Project Glasswing Initial Update on vulnerability discovery velocity.