# Agents Verifying Agents: Turtles All the Way Down

> Source: <https://dev.to/jaikora/agents-verifying-agents-turtles-all-the-way-down-1mdh>
> Published: 2026-07-04 14:10:14+00:00

Last week I caught myself writing a subagent whose entire job was to check the output of another subagent. Then, about twenty minutes in, I paused and asked myself what was checking that one. Nothing was. I'd built a verifier for my verifier and stopped there, not because I'd solved anything, but because I ran out of patience.

That's the part nobody tells you about agent pipelines. The verification pattern works, genuinely. Two-stage checks, like triggering an actual crash before a report even counts, then having a second pass confirm it's not a test-only fluke, cut down on false positives in a real way. I've seen it happen in my own setup too: a bug report that would've wasted an hour of a human's time gets killed before it ever reaches a person. That part is good. That part I'd defend to anyone.

But somewhere in scaling that pattern up, I stopped asking whether I was fixing the reliability problem and started just relocating it. Every layer I add is another place for something to fail silently. An agent stalls mid-task and its status just says "working" forever, and unless I happen to check, I won't know. I'm not exaggerating when I say I've had agents sit idle for an hour because nothing in the pipeline was watching the watcher. You don't notice these things until you're managing enough agents that manual review stops scaling, and by then the problem isn't the original task, it's the six layers of infrastructure you built to trust the task.

Here's the reframe I keep coming back to: verification isn't a wall, it's a filter with a mesh size. Every subagent you add narrows the mesh a little, but it never closes the hole, it just moves it somewhere less visible. I used to think adding another check meant more confidence. Now I think it means I've added another silent failure mode and convinced myself I haven't, because the dashboard says green.

The heuristic I actually use now: for every verifier I add, I ask what happens when the verifier itself is wrong, and whether I'd notice. If the honest answer is "I don't know," that's not a verification layer, that's a place I've hidden risk instead of removing it.

I don't have a clean fix. I prune stale agents by hand, I still get surprised by stalls, and I still catch myself reaching for one more layer of checking when what I actually need is a better original task. How many layers of verification are you running before you stop and ask whether you trust any of them?
