RESEARCH · JUNE 10, 2026 · 5 MIN READ

The OpenSSL CVE That Flipped the Audit Math

CVE-2026-45447 was found by AI, not auditors. The implication for every engineering team running sampled reviews is concrete and immediate.


The OpenSSL CVE That Flipped the Audit Math

OpenSSL patched 18 vulnerabilities on June 9, 2026. One of them, CVE-2026-45447, is a heap use-after-free in the PKCS#7 verification path, rated high severity, and capable of remote code execution. It was found by a researcher working with Claude AI and Anthropic Research. Prior external audits, including work by firms with established reputations in cryptographic code review, did not catch it. That gap is worth pausing on.

What the Bug Is and Why It Survived#

CVE-2026-45447 triggers when a PKCS#7 or S/MIME signed message contains a SignedData digestAlgorithms field as an empty ASN.1 SET. Under that condition, OpenSSL may free a caller-owned BIO inside PKCS7_verify(). Any subsequent use of that BIO by the calling application produces a use-after-free, which can corrupt the heap, crash the process, or enable code execution.

The flaw is subtle. It requires reasoning about ownership conventions across a caller/callee boundary in a code path that only activates under a specific malformed input condition. Human auditors triage. They allocate time to the paths most likely to be interesting, and a malformed-input edge case in a verification function that rarely receives adversarial input falls below the cut line. Nothing about that triage logic was wrong. It just has a coverage ceiling.

Alex Gaynor of Anthropic was credited with reporting six of the 18 patched vulnerabilities in this release, according to SecurityWeek. Six findings in a single disclosure batch, from one researcher with AI assistance, is not a fluke.

The Scale Behind the Single CVE#

CVE-2026-45447 is one data point in a much larger shift. Anthropic's Project Glasswing, a restricted program running Claude Mythos Preview against 1,000+ open-source projects, surfaced more than 10,000 high- and critical-severity vulnerabilities, per Anthropic's own disclosure. Of 530 vulnerabilities disclosed to maintainers, only 75 had been patched as of late May , under 15%, according to reporting on the Glasswing update.

The patch rate is not the headline. The headline is that open-source maintainers asked Anthropic to slow down disclosures. They did not dispute the accuracy. They disputed the capacity to respond. That is a precise statement about where the bottleneck now sits: not in finding vulnerabilities, but in absorbing the volume of findings that AI-assisted discovery produces.

On the adversarial side, Google's Threat Intelligence Group confirmed the first known case of an AI system discovering and weaponizing a zero-day that was then deployed in the wild. A criminal actor used a frontier model to find a two-factor authentication bypass, build a working exploit, and deploy it before any defender had identified the vulnerability. Discovery-to-exploitation in hours. The same class of capability is now operating on both sides of the line simultaneously.

The Sampling Assumption Is Now a Liability#

Security reviews have operated on a triage model for practical reasons. Budgets are finite, codebases are large, and skilled reviewers are expensive. The standard heuristic: spend human attention on the highest-risk surfaces, trust that mature widely-reviewed code is relatively hardened, and let automated scanners handle the known-CVE matching.

Project Glasswing found decades-old vulnerabilities in mature, widely-reviewed open-source code , bugs that survived millions of automated tests and years of manual scrutiny. The triage heuristic had a coverage ceiling that AI-assisted full-corpus analysis does not share. Vulnerabilities requiring cross-function reasoning about memory ownership, logic state across large call graphs, or malformed-input behavior in rarely-exercised paths are precisely the class that sampling misses and coverage finds.

Cloudflare's CSO Grant Bourzikas captured the triage problem from the other direction: "Ask a model to find bugs, and it will find them, whether the code has any or not. Findings come back hedged with 'possibly,' 'potentially,' and 'could in theory,' and the hedged findings vastly outnumber the solid ones. That's a reasonable bias for an exploratory tool. It's a ruinous one for a triage queue." The false positive rate in Glasswing's independent review came in at 9.4%, with 62.4% of passed findings confirmed as genuinely high or critical per uctoday.com. At that volume, 9.4% noise is still hundreds of false positives , which means the verification step matters as much as the discovery step.

What Changes for Teams Running Review at Scale#

The relevant shift for engineering teams is not "AI is better than humans at finding bugs." It is that an adversary running the same class of tooling against a repository no longer faces the same coverage ceiling that human auditors do. Every surface in the codebase is now reachable at cost. The question is whether the defensive review is operating at the same coverage level.

Hyrax reads the entire codebase. Every line. It does not sample. The six agent domains , security, code quality, reliability, API and data, ops, UX , run in parallel across the full corpus, not against a triaged subset. The 13-step verification process exists specifically to address the signal-to-noise problem that Bourzikas described: findings that do not survive verification do not become PRs. Hyrax submits the PR. The user merges. Nothing auto-merges.

For teams that have previously argued that human review on "interesting" code paths is sufficient, the OpenSSL disclosure is a useful data point: the heap use-after-free in PKCS7_verify() was not in an obvious high-risk path. It was in a malformed-input edge case in a mature codebase. That is the coverage gap that full-corpus analysis closes, and that sampled review cannot.

The architecture behind this approach is covered in more depth in how Hyrax reviews code.

The Remediation Bottleneck Is Real#

Discovery at scale creates a downstream problem that tooling alone cannot solve. Open-source maintainers asking Anthropic to slow down disclosures is a documented outcome of discovery outrunning remediation capacity. Enterprise teams face the same constraint internally: a tool that surfaces 200 findings against a legacy service is only useful if the team can triage, verify, and fix those findings without accumulating a review backlog larger than what the original review process produced.

The answer is not to reduce coverage. It is to couple coverage with verification that filters noise before it reaches human reviewers, and with automated fix generation that handles the findings where the change is mechanical. The 75 patched OpenSSL vulnerabilities out of 530 disclosed represent what happens when disclosure outpaces remediation infrastructure. The ratio will not improve by slowing discovery.

Hyrax is live at hyrax.dev.


Sources

  1. 01securityweek.com
  2. 02uctoday.com
  3. 03thenextweb.com
  4. 04cybertechnologyinsights.com
  5. 05pravitech.substack.com