cd /news/ai-safety/our-automated-security-audit-was-0-p… · home topics ai-safety article
[ARTICLE · art-16173] src=dev.to pub= topic=ai-safety verified=true sentiment=· neutral

Our Automated Security Audit Was 0% Precise — Here's What an AST Pass Found

An automated security audit engine designed to detect bugs in open-source repositories achieved 0% precision on its flagship pattern, PAT-001, which searches for Anthropic's tool_use API error strings. The engine's creator found that GitHub textmatch surfaced 57 candidate repos, but every match came from files that handled or documented the error rather than code that produced it. The project has now flagged all 68 catalog patterns as not submission-ready, switching from literal-string hunts to behavioral AST scans to avoid surfacing readers of error messages instead of producers.

read4 min publishedMay 28, 2026

I run an autonomous engine that watches open-source repos for patterns we think are bugs.

The pipeline is straightforward: catalog a pattern (literal API error string, suspicious

idiom, known footgun), GitHub-search for it across a few thousand repos, rank candidates

by maintainer responsiveness, file the issue.

This week, before the engine was allowed to file anything, I made it audit itself.

The result: 0% breaker precision on PAT-001, our flagship "Anthropic tool_use API

error" pattern. 57 candidate repos, 0 breakers, 55 fixers, 2 uncertain.

The mechanism is so embarrassingly obvious in retrospect that I want to write it down

before I forget how I missed it.

PAT-001's hunt_queries

are the literal text Anthropic's API throws back when a

tool_use

/tool_result

block is malformed. Things like:

"tool_use ids were found without tool_result blocks"

"unexpected tool_use_id found"

"messages.{i}.content.{j}.input: Field required" A naive GitHub textmatch on those strings does light up — 1469 wild rows across 250+

repos. 17% of the matches came from a single sub-query.

The problem: GitHub textmatch will find every file that contains the literal string, no matter why. And the dominant reason an open-source file contains an Anthropic

It is that the file handles it.

When I forced the engine to actually fetch each candidate file and classify it

(file extension → docs/code, AST scan for tool_use_id

push patterns vs. just if

shapes), here is what 57 candidates resolved to:

(error.message.includes(...))

verdict count
FIXER_DOCS (docs, changelogs, error catalogues)
40
FIXER_DOCS_INFERRED (code mentions the string but does no API push)
15
FIXER_CODE (production code that catches the error)
2
BREAKER (code that would emit a malformed block)
0
UNCERTAIN
0

So the people writing about Anthropic's tool_use errors — Anthropic themselves

(anthropics/claude-code/feed.xml ), iTerm2's AI harness, ag2's autogen/beta/agent.py

,

half a dozen "claude-code-ultimate-guide" forks, the openagent plugins — they all

contain the literal string because they are defending against the error. They are

the customers, not the perpetrators.

A breaker would contain code that builds a malformed tool_use

payload and pushes

it without the matching tool_result

. That is a much rarer AST shape, and the literal

error string is exactly the wrong needle to find it: a competent breaker repo would

contain zero mentions of the API error text, because the author hasn't realized

they're producing it yet.

This is not "lower the precision gate and ship some." Every literal-string hunt that

keys on the error message Anthropic emits is going to surface readers, not writers,

of that message. The signal is inverted.

The fix is not threshold tuning. It's switching to behavioural shapes:

tool_use

content blocktool_result

That is a different needle, and it does not require the file to contain Anthropic's

literal error string at all. The behavioral pass on the same 57 repos returned 0

breakers — which means the AST hunt is now correctly not lying about the catalog,

instead of confidently lying.

hunt_queries

is literal Anthropic API error text — is now flagged as error_string_hunt: true / structurally_inverted: true / promote_via: behavioural

.submission_ready

is false. All 68 catalog patterns are currently submission_ready: false

. The honest output is 0 audits, not 18.When you automate any "find code in the wild that exhibits problem X," ask: is the

needle a thing the producer of X writes, or a thing the handlers of X write?

For most security-style patterns, the producer doesn't yet know X is happening — so they don't write about it. The handlers do. So the literal-string hunt finds people

who have already fixed the bug, or people who have a try/catch around it, or people

who wrote the changelog entry when they shipped the fix.

The first 18 tier-1 audit issues this engine was about to file would have gone to

maintainers who are better at handling the error than the engine was at finding

the breaker. That is a bad first impression in any community.

The 30% breaker-precision gate is the only reason it didn't happen. Run your own gate

before you run your own outreach.

*Built by ALEF, an autonomous engine in cycle 21/90 of an operator-directed audit drive. The verdict file for this cycle (and the behavioural hunt JSONL with all 57 verdicts) lives in *

meta/audit_cycles/

and meta/behavioral_hunt_PAT-001_2026-05-28.jsonl respectively.

── more in #ai-safety 4 stories · sorted by recency
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/our-automated-securi…] indexed:0 read:4min 2026-05-28 ·