ENGINEERING · JUNE 22, 2026 · 6 MIN READ

The audit trail agents don't leave by default

Asymptote Labs released Beacon this week: open-source telemetry that captures what Claude Code, Cursor, and Codex CLI actually do, not just what they output.

The audit trail agents don't leave by default

Asymptote Labs released Beacon on June 22, 2026: an open-source project that configures telemetry for Claude Code, Codex CLI, Cursor, and Claude Cowork, then writes a normalized record of what each agent did across local, CI, and cloud surfaces. The tool exists because the default state today is nothing. A PR merges. The diff is visible. The tool calls, file reads, and shell commands that produced it are not.

What a diff does and does not tell you#

A git diff is a final-state artifact. It records which lines changed, in which files, at which commit. It does not record which files the agent read but did not change. It does not record which shell commands ran during the session. It does not record whether the agent read a .env file, called kubectl, or pulled a new dependency whose postinstall script executed arbitrary code. The diff is useful. It is not a behavioral record.

The gap becomes a problem during incident review. As BaristaLabs documented in June 2026, teams reconstructing agent activity after the fact typically end up with three fragments: a chat log, a git diff, and scattered shell history that may or may not belong to the agent session in question. That is not an audit trail. It is debris.

What Beacon actually captures#

According to the Beacon README, the tool discovers supported local runtimes on a host, configures data collection for those runtimes, and normalizes activity into endpoint events. Three layers: an agent runtime layer pulls from local hooks and OpenTelemetry sources; a Beacon endpoint layer normalizes events and applies retention and redaction settings; an output layer writes JSONL locally, exposes a local dashboard, or forwards to an enterprise SIEM.

Practically, this means: prompts, tool calls, file reads, file writes, shell commands, and process activity , all connected to the original human instruction. The GitHub repository was created May 12, 2026 and had 192 stars and 7 forks as of June 8, 2026, per BaristaLabs' reporting. Those are early-adopter numbers. The importance is not the adoption curve. It is what the project reveals about the default absence of this data.

Why June 2026 produced a cluster of these tools#

Three separate releases in the same two-week window signal something beyond coincidence. New Relic launched open-source AI Coding Observability in early June, explicitly targeting the blind spot where coding assistants like Claude Code, Cursor, and GitHub Copilot operate outside traditional observability stacks. Cursor shipped Auto-review on June 11, a governance mechanism for local agents that have access to files, credentials, environment variables, and production-adjacent systems. DeusData's codebase-memory-mcp v0.8.0, released June 11, added persistent queryable memory for agents across 9 languages , including Java, Kotlin, and Rust , so agents can maintain context across sessions rather than reconstructing it from scratch each time.

Each of these tools addresses a different part of the same structural problem: agents do work that is consequential, multi-step, and largely invisible to the review processes built around human-written diffs. The tooling category is forming now because agent autonomy reached the level where the absence of telemetry became operationally dangerous, not merely theoretically inconvenient.

The policy-before-visibility trap#

A predictable response to agent risk is to reach for allowlists. Block these directories, require approval before that tool, disable shell access in CI. Policy is not wrong. But as Clord's June 11 writeup on agent observability put it, a chat transcript tells you what the agent said, not what happened. Policy without a behavioral record produces two failure modes simultaneously: over-blocking legitimate work because reviewers cannot distinguish routine tool calls from dangerous ones, and under-blocking dangerous work because the dangerous calls do not appear in the policy surface being monitored.

Context is the problem. A kubectl command during an infrastructure incident is appropriate. The same command as a side effect of a unit test fix is not. A .env read during debugging is expected. The same read during a documentation edit is a flag. Static allowlists cannot distinguish these cases. Behavioral telemetry, combined with task context, can.

What this means for code review#

Hyrax operates at exactly the surface these tools are now trying to instrument. A reviewer looking only at a diff is reviewing the output of a process they cannot see. Hyrax reasons about how code was likely produced, what patterns are present across the full repository context, and what the fix needs to be before the PR is created.

The broader implication for engineering teams: code review is already downstream of agent behavior. As agents take on more autonomous work, the diff will increasingly be the least informative artifact in the chain. Teams that configure telemetry now , even minimally, even just JSONL on a developer laptop , will have a reconstruction path when something goes wrong. Teams that do not will be left arguing from fragments.

Beacon is one tool. The requirement it addresses is not going away. The telemetry layer is becoming the new audit trail, and right now most teams have neither.

Hyrax is live at hyrax.dev.

The audit trail agents don't leave by default

What a diff does and does not tell you#

What Beacon actually captures#

Why June 2026 produced a cluster of these tools#

The policy-before-visibility trap#

What this means for code review#

Sources