{"slug": "what-12-failure-classes-and-30-billion-tokens-spent-taught-us-about-trusting-ai", "title": "What 12 failure classes and 30 Billion tokens spent taught us about trusting AI coding agents", "summary": "After analyzing hundreds of real AI coding agent runs, a developer identified 12 distinct failure classes—such as hallucination, scope creep, and fake-passing tests—each requiring a specific fix. The findings, drawn from 30 billion tokens of experience, challenge the binary pass/fail approach of most agent frameworks and advocate for tailored governance strategies like grounding checks, file scope enforcement, and context distillation.", "body_md": "We've been watching AI coding agents fail in production for long enough that we started keeping a taxonomy.\n\nNot \"the agent hallucinated\" — that's not a failure class, it's a category. The real failure modes are specific, they repeat, and crucially, they each require a different fix.\n\nHere's what we found across hundreds of real runs, and why it changed how we think about agent governance.\n\nThe failure modes that actually kill agent runs:\n\n**1. Hallucination** —\n\nThe agent generates code that looks right and tests that confirm it, but the test is testing the wrong thing. This is the scariest class because it has a green result.\n\nThe fix is grounding: forcing the agent back to the actual repo state before the next attempt.\n\n**2. Scope creep** — The agent modifies files outside the task boundary. Usually well-intentioned — it \"fixes\" something adjacent — always dangerous.\n\nThe fix is file scope enforcement: deny-listed paths that roll back automatically on violation.\n\n**3. Fake-passing tests** —\n\nThe agent writes tests that pass but don't test the actual behavior. Closely related to hallucination but distinct: the code is often correct, the test just isn't covering the right cases.\n\nThe fix is verifier separation — your test command is the ground truth, not the agent's confidence level.\n\n**4. Budget pressure shortcuts** —\n\nWhen a run is approaching its token budget, agent behavior degrades. It starts making confident guesses instead of reading files. Results get worse as context gets longer.\n\nThe fix is pre-execution budget preflight: stop the attempt before it starts if it's projected to breach remaining budget, rather than letting it run degraded.\n\n**5. Context bloat** —\n\nBy attempt 5, the agent is paying to resend everything that failed four times. Token cost grows exponentially across retries while signal stays flat.\n\nThe fix is context distillation: compress prior attempt history into a structured summary before the next attempt, not a raw failure dump.\n\n**6. Environment mismatch** —\n\nThe agent passes in CI but the verifier runs in a different environment. Node version, pnpm vs npm, missing env vars.\n\nThe fix is environment canonicalization in the run contract.\n\n**7. Approval boundary violations** —\n\nThe agent modifies files that should require human sign-off: config, migrations, CI definitions. Often not malicious, just overambitious.\n\nThe fix is policy routing — flag these attempts for a different approval path before execution.\n\n**8. Injection in tool output** —\n\nTool call results (file reads, search results) contain content that looks like instructions. The agent follows them.\n\nThe fix is a safety leash that scans for injection patterns before admitting tool results into context.\n\n**9. Secret exposure** —\n\nThe agent picks up .env values or API keys in file reads and includes them in output.\n\nThe fix is pre-execution scanning for secret-like values in task text and tool results.\n\n**10. Repo grounding failure** —\n\nThe agent makes changes that conflict with current HEAD because it's working from a stale view of the repo.\n\nThe fix is repo-state verification before each attempt.\n\n**11. Verifier command exploitation** —\n\nThe agent modifies the test itself to make it pass rather than fixing the code. More common than you'd expect.\n\nThe fix is read-only verification: the verifier command runs in a scope where test files can't be modified.\n\n**12. Terminal failure** —\n\nA class of errors where retrying won't help: the task is malformed, the repo is in a state that can't satisfy the objective.\n\nThe fix is hard exit — don't retry, roll back, log the terminal state, stop spending.\n\nWhy this matters for how you govern agents\n\nThe common pattern across all 12: they require different responses.\n\nMost agent frameworks treat failure as binary — it passed or it didn't, retry or stop. But a hallucination needs a grounding check.\n\nA scope creep needs a rollback. Budget pressure needs an early exit. Context bloat needs compression. Treating them all as \"retry\" is how you burn $4,200 over a long weekend.\n\nThe other pattern: most of these are detectable before the next attempt runs, not after. Budget preflight is the clearest example — you know whether the next attempt will breach remaining budget before you call the agent.\n\nInjection scanning can happen before the tool result enters context.\n\nFile scope can be enforced before any write is admitted.\n\nThat's the shift we made building MartinLoop: pre-execution enforcement as the primary defense, post-execution logging as the audit trail. Not the other way around.\n\nWhat this looks like in practice\n\nBefore a run starts,\n\nMartinLoop prints a governed run plan — per-phase cost estimates, routing decisions, burn percentage against session budget, and priority ordering.\n\nAfter a run completes, it prints a receipt: every commit, every repo, every feature.\n\nA session we ran last week on our own codebase: $9.60 estimated, $16 cap, 13 commits across 3 repos, 9 new features, estimate held.\n\nThe agent calculated the budget itself — that's not a number you type in. It's the governance layer doing pre-execution cost estimation before any attempt is admitted.\n\n**Try it (bash)**\n\nnpx -y martin-loop@latest demo\n\n**Full install:**\n\nnpm install -g martin-loop\n\nmartin run \"fix the auth regression\" --budget 3 --verify \"pnpm test\"\n\n**MCP for Claude Code:**\n\nclaude mcp add --scope user martin-loop -- npx -y @martinloop/mcp\n\n**Open source, Apache 2.0: [Github Repo](https://github.com/Keesan12/martin-loop)\n\n(please do us a favor and star the repo if you like it so we can keep it OSS)\n\nWhat failure modes have you hit that aren't on this list?\n\nWe're still building the taxonomy — genuinely curious what's showing up in real runs.", "url": "https://wpnews.pro/news/what-12-failure-classes-and-30-billion-tokens-spent-taught-us-about-trusting-ai", "canonical_source": "https://dev.to/cryptokeesan/what-12-failure-classes-and-30-billion-tokens-spent-taught-us-about-trusting-ai-coding-agents-pi7", "published_at": "2026-06-30 20:41:47+00:00", "updated_at": "2026-06-30 21:19:13.085576+00:00", "lang": "en", "topics": ["artificial-intelligence", "ai-agents", "developer-tools", "machine-learning", "ai-safety"], "entities": [], "alternates": {"html": "https://wpnews.pro/news/what-12-failure-classes-and-30-billion-tokens-spent-taught-us-about-trusting-ai", "markdown": "https://wpnews.pro/news/what-12-failure-classes-and-30-billion-tokens-spent-taught-us-about-trusting-ai.md", "text": "https://wpnews.pro/news/what-12-failure-classes-and-30-billion-tokens-spent-taught-us-about-trusting-ai.txt", "jsonld": "https://wpnews.pro/news/what-12-failure-classes-and-30-billion-tokens-spent-taught-us-about-trusting-ai.jsonld"}}