"Read-Only Reviewer Agents Catch What Your Main Agent Waves Through" A developer proposes using a read-only reviewer agent to catch errors in code generated by a main agent. The reviewer, which cannot edit code, identifies spec drift, missing cases, test-implementation coupling, and terminology mismatches. This structural fix improves verification over prompting the main agent to self-review. The main agent writes the code. The reviewer agent reads it. The reviewer can't edit anything. That constraint — no edit permission — is the whole point. It's not a safety measure. It's what makes the reviewer trustworthy. I noticed a pattern in my own agentic work. The main agent finishes a feature, the tests pass, I mark it done. Later I find something wrong — a null case not handled, an error message that doesn't match the spec, a behavior that passes the test but fails in a slightly different context. Things that were visible in the code. Things I should have caught. The main agent wrote the code and verified the code. By the time it's reading its own output, it has already mentally modeled the feature as complete. It reads with the assumption that things are correct and looks for confirmation. It finds confirmation, because it wrote the code to pass the test it also wrote. This is a verification problem, not an intelligence problem. The same issue shows up in human code review — the author is the worst person to catch their own oversights, not because they're careless but because they're too close to it. You could try to solve this with prompting. Tell the main agent to "review critically before finishing." I've tried it. It helps a little. The agent will surface some issues it would have let slide. But it's working against its own completion-bias, and that's a weak force. Scoping tools differently is a structural fix, not a nudge. A reviewer agent that literally cannot call any write or edit tool has a different relationship to the code. It's not trying to finish. There's nothing to finish. It reads the diff, reads the spec, reads the test results, and that's all it can do. Its job is to notice things, not to fix them. When you remove the option to fix, the noticing gets sharper. In practice the reviewer finds a few categories of things: Spec drift. The main agent made a small judgment call somewhere that deviated from the spec. Not wrong exactly — a reasonable decision — but not what the spec says. The reviewer has the spec open and no stake in the implementation, so it flags it. The main agent might not have even noticed the deviation because the implementation "feels right." Missing cases. The main agent handled the happy path and the obvious error case. The reviewer asks: what happens if this input is empty? What if this call fails? These are findable by reading; they just require a different mode of attention than writing. Test-implementation coupling. Sometimes the test passes because the implementation and the test were written together, with the same blind spot. The reviewer reads both and asks whether the test actually covers the behavior, or just covers the code. These are different things. Terminology mismatch. Error messages, variable names, API response fields that don't match the spec or the existing codebase conventions. Small. Easy to miss when you're focused on logic. Easy to catch when you're only reading. The mechanics are simple. You're running two agent calls sequentially, not in parallel. The main agent does its work. When it's done, you pass the diff and the spec to a second agent with a system prompt that says it's a reviewer — and you configure it with read-only file access. In Claude Code specifically, you can scope tool permissions at the agent level. The reviewer gets Read , Grep , Glob . No Edit , no Write , no Bash . It literally can't change anything even if it wanted to. The reviewer's output is a list of findings. The main agent then gets those findings and addresses them. One more pass, one more reviewer check if needed. Short scripts, throwaway code, things you'll rewrite anyway — the overhead isn't worth it. The reviewer adds latency and cost. For anything going to production, or anything that needs to match a spec, or anything you'll hand off: worth it. The other case where it adds real value is when the spec is load-bearing. If you've written precise acceptance criteria and you want to know whether the implementation actually meets them — not just "tests pass" but "the criteria are satisfied" — the reviewer is good at that comparison. The problem isn't that the main agent makes mistakes. It's that agent and reviewer are the same entity if they share context, state, and stake in the outcome. Separate the concerns. Give each agent exactly the tools its role requires, no more. The reviewer doesn't need to write because its job is to read. When it can only read, it reads better.