cd /news/ai-safety/making-llm-security-verdicts-verifia… · home topics ai-safety article
[ARTICLE · art-46958] src=dev.to ↗ pub= topic=ai-safety verified=true sentiment=↑ positive

Making LLM security verdicts verifiable: the evidence gate pattern

A developer built USAP, an open-source system that enforces a strict evidence gate for LLM security verdicts, requiring every verdict to cite a resolvable source. The system rejects hand-wavy output by checking evidence references against a contract of 11 typed JSON fields, and it ships with a held-out corpus of real incidents for evaluation. USAP runs as an MCP server or as system prompts, aiming to make AI security analyst outputs verifiable and trustworthy.

read2 min views1 publishedJul 3, 2026

Every "AI security analyst" I tried had the same flaw: a correct verdict and a confident-but-wrong one are indistinguishable on screen. In security that's not a UX nit — it's the whole problem. So I built USAP around a single rule, and this post is about that rule and three things that fell out of it.

USAP's output contract is 11 typed JSON fields. The uncompromising one is evidence_references

: every verdict must carry at least one source that resolves. Four accepted forms:

mcp:<logical>:<tool>:<call_id> — evidence fetched live from a connected MCPhttps://...

— a canonical external source (CVE, EPSS feed, MITRE)s3://...

— an operator artifactlocal://<repo-path> — an in-repo standard, for advisory verdicts (the path must exist)A source like "scanner"

or "the SIEM showed it"

is rejected at the contract boundary. That one check removes most hand-wavy output, because the model can't satisfy the contract without pointing at something real.

If a verdict cites mcp:siem:search , that can't hard-code Splunk — not everyone runs Splunk. So agents declare logical capabilities and a registry resolves them to whatever the operator connected (Splunk, Elastic, Sentinel). If nothing implements a capability, the agent degrades gracefully and marks that axis UNKNOWN rather than inventing telemetry.

Once you demand resolvable evidence, narrated numbers feel wrong too. CVSS is computed from the published vector. EPSS is pulled live from the FIRST feed. Confidence comes from a written rubric. If a number can't be computed, the tool returns "qualitative" instead of fabricating one — and the contract rejects a cvss_score

that disagrees with its cited vector.

A system that enforces evidence shouldn't grade itself against its own answer key. USAP ships a held-out corpus of real public incidents (Log4Shell, xz, Capital One, MOVEit) plus benign false-positive traps, and a stdlib harness that reports precision, recall, FPR, and MTTD.

USAP is open-source (Apache-2.0), stdlib-only, and runs as an MCP server or as paste-in system prompts in any model. Every example in the repo is a real command you can re-run. I'd value feedback on the gate design.

── more in #ai-safety 4 stories · sorted by recency
── more on @usap 3 stories trending now
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/making-llm-security-…] indexed:0 read:2min 2026-07-03 ·