Auditing the auditor with four AI agents

wpnews.pro

cd /news/ai-agents/auditing-the-auditor-with-four-ai-ag… · home › topics › ai-agents › article

[ARTICLE · art-47867] src=dev.to ↗ pub=2026-07-04T11:42Z topic=ai-agents verified=true sentiment=· neutral

Auditing the auditor with four AI agents

Turva.dev, an audit business, ran its own site through four AI agents running Claude Fable 5, which generated 91 findings from a line-by-line review of 5,400 lines of source code. Most findings were minor drifts, but four HIGH-severity claims were flagged; three were false alarms after verification against primary sources, while one real gap—platform observability logging enabled despite a promise of no logging—was fixed by disabling observability. The exercise demonstrates that automated scanners miss critical discrepancies, and that verifying findings against primary sources is essential before acting on them.

read3 min views1 publishedJul 4, 2026

The company page of turva.dev tells a buyer they can read every line before hiring me. An audit business should survive its own promise, so I pointed it at my own site. Four AI agents, all running Claude Fable 5, read the public surface line by line: the Worker source that renders turva.dev, about 5,400 lines of it, the MCP server behind mcp.turva.dev, and the READMEs of the public repos. They came back with 91 findings.

Most were the drift every living codebase accumulates. One surface advertised RS256 and ES256 for verification while the site's actual key is Ed25519. A response header named x-markdown-tokens carried a word count. A guide expanded MPP to the wrong protocol name. A table in one guide had never rendered as a table, because the renderer did not support tables. The legal page called this a registered company when it is a registered business. None of these move a scanner.

About 60 fixes shipped, and both scanners were re-run after the deploys: startuphub.ai reads 100/100, grade A+, with all six categories at 100, and isitagentready.com reads Level 5. The scores were the same before most of these fixes, and that is the point. A scanner cannot see whether the key algorithm you advertise is the one you use. Line-by-line reading is the layer under the score.

The agents marked four findings HIGH. All four fell when verified, and they traced to two root causes.

The first: the site claims 100/100 verified by two independent scanners, and the agents knew that one of those scanners, isitagentready.com, grades sites on levels, 0 to 5. A percentage from a level-based scanner reads like an invented number, so the claim was flagged as false advertising on the audit's own subject matter. The scanner's own scorecard settles it. Run the scan and the report shows 100/100 for this site next to Level 5. The claim stands as written.

The second: an agent fetched the live MCP server card and read version 1.1.0 where the source says 1.2.0. Deployed code that trails its repo is a real problem anywhere, so HIGH was the right severity for the claim. It was still wrong. The fetch had come through a cache, and pulling the deployed Worker straight from the Cloudflare API showed 1.2.0, identical to the source. The finding described the measuring instrument, and the deployment was never out of sync.

One HIGH survived. The MCP server's README promised that the service does no logging, and the Worker configuration had platform observability switched on, which stored a log line for every call. Promise and code disagreed, and this is the exact class of gap the audit exists to catch. The repair went the honest way around. Reality changed to match the words: observability is off, and the README now also says out loud that platform logs are disabled. Rewriting the README to say minimal logging would have been faster to ship, and worth less to anyone who reads it.

A finding is a claim, and a claim gets the same treatment as marketing copy. Verify it against the primary source or drop it. Acting on the dead alerts here would have made the site worse, because fixing a correct claim plants a real error where a false alarm used to be. Read the scanner's own scorecard instead of assuming its scale, and pull the deployed artifact from the platform instead of trusting a cached fetch. Minutes of checking killed four HIGHs.

The same discipline applies when you buy an audit. The report that reaches you should be the survivors, and a useful question for any auditor is how many findings were dropped between the raw scan and the written report. A report where the answer is zero usually means nobody checked.

For an agent-readiness audit where the findings are verified before you read them, contact info@turva.dev. Originally published at https://turva.dev/blog/auditing-the-auditor

source & further reading

dev.to — original article The True Classification of AI: Part 3 — OPERATIONAL AI MCP Deep Dive, Part 1: Why Model Context Protocol Kills Integration Glue Code for Good GPU Survivors: Can You Survive a 1T Parameter Inference Run?

~/api · this article 200

$curl api.wpnews.pro/v1/news/auditing-the-auditor-wit…

Read original on dev.to → dev.to/turva-dev/auditing-the-auditor-with-four-…

mentioned entities

Turva.dev

Claude Fable 5

startuphub.ai

isitagentready.com

Cloudflare

metadata

slugauditing-the-auditor-with-four-ai-agents

topic#ai-agents

secondary2 topics

sentimentneutral

canonicaldev.to

navigation

← prevThe True Classification of AI: P…

next →OpenCode AI config to deny read …

── more in #ai-agents 4 stories · sorted by recency

lockinmcp.com · 4 Jul · #ai-agents

Show HN: An MCP server that gives your AI assistant write access to /etc./hosts

gist.github.com · 4 Jul · #ai-agents

OpenCode AI config to deny read access to .env, node_modules, build artifacts, cache dirs and ask before bash execution

dev.to · 4 Jul · #ai-agents

The True Classification of AI: Part 3 — OPERATIONAL AI

firethering.com · 4 Jul · #ai-agents

7 Open Source AI Coding Agents That Don’t Need a Subscription

── more on @turva.dev 3 stories trending now

wpnews · 27 May · #artificial-intelligence

How I Run Two Claude Accounts as One

wpnews · 30 May · #ai-safety

Nightcord Security Analysis Report - Threat Investigation

wpnews · 28 May · #ai-startups

The Niche SaaS Opportunity Map 2026: Highly Demanded Subscribed Categories Beyond Mainstream

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required