The Audit Tax: Why Your Agent Made You Slower

A developer argues that AI coding agents, while fast at generating pull requests, impose an 'Audit Tax' that slows down software delivery because human review of agent-written code takes disproportionately long. Citing a LinearB report showing AI PRs take 4.6x longer to review, the developer recommends a three-tier verification pipeline: deterministic checks, a review subagent, and human oversight. The post concludes that mastering verification is the key to unlocking AI engineering productivity.

Originally published in Temrel, a weekly newsletter on agentic engineering. You ask an agent to code an update. It takes about 90 seconds to produce the PR. You then spend the next 90 minutes reading it line by line to see if you trust it. You might, whisper it, be shipping code even slower than you were before. Agent-based development velocity is borrowed time, re-invoiced with interest at review time. The agent writes the PR in seconds; you pay for that speed in the time it takes to decide whether to trust what it has written. This is the Audit Tax . This is a deliberate sequel to last week's "Stop prompting, start looping." Verification was one of our six dials, and today we focus on that one. Code generation is effectively solved. By mid-2026, even the die-hard holdouts can't seriously argue that coding agents underperform humans in commercial environments. The hard part now is verification. The old scoreboard measures the wrong thing: model benchmarks, tokens per second, and the rest. The real measurement is how quickly agent-produced code gets into production. According to LinearB's 2026 Software Engineering Benchmarks Report https://linearb.io/resources/software-engineering-benchmarks-report , AI PRs take 4.6x longer to get reviewed. That is a product of higher volume and faster delivery, and it is the biggest blocker to AI engineering productivity. Verification is harder than it looks. You can't interrogate the agent and trust the answer; the hallucination might be buried in the reasoning. Your old heuristics for reviewing human code are unfit for the task: Agent-written PRs always look clean and self-confident, whether they work or not. Sloppy formatting and thin documentation no longer signal a weak PR, so you can't kick it back on those grounds. Enforcing small diffs doesn't work either. Try it and "4.6x longer" becomes a stretch goal; you'll be drowning in PRs forever. Individual reliability means nothing now. John, the old hand who always shipped clean code and earned a cursory review? John's gone. There's just Claude now. And don't forget: you contribute to The Sloppening https://spark.temrel.com/p/the-sloppening every time you push slop to the codebase. Get your cheap, deterministic gates in first: typecheck, tests, lint, build. You already have them, they're virtually free and fast, and they catch stupid mistakes. Anthropic calls these code-based graders . Then add a review subagent. In Anthropic's terms, model-based graders . Check the diff against the stated intent, not just whether it builds and runs. Then human-in-the-loop: a person's eyes on anything that survives the deterministic and agent-review gates. The machines clear the early hurdles, and the human lets the output hit production. Anthropic calls these human graders . Anthropic recommend starting evals early, and so do I. Record the cases where the agent misses requirements, and once you have around 20, start building your evals. Add your deterministic checks plus an LLM-as-judge for the fuzzy intent. Wire them to triggers so you don't kick them off by hand. There's an in-depth Anthropic blog https://www.anthropic.com/engineering/demystifying-evals-for-ai-agents on methodology that is lighter on technical implementation. Take that as a sign of how early this step in the agentic loop still is. This is the reframing of the dev career ladder. We started with context engineering 2024 , then loop engineering 2026 . Follow the thread and you become one of the top players in software development, set up well for what's next. Whoever owns verification owns the bottleneck, and whoever owns the bottleneck owns the leverage. Code generation is solved. The tax is rigorous evaluation. Pay the tax on purpose, or pay it by accident. Subscribe to Temrel for weekly agentic engineering field notes.