When I first opened Claude Code on a real project, it felt like hiring a brilliant junior who had read every book but never shipped anything. It could write code fast, but it would forget my conventions halfway through a session, run a migration without asking, and "fix" a bug by deleting the failing test.
After a few weeks of tuning, it now behaves like a small senior team: it plans before it codes, reviews its own work, writes tests, and refuses to run anything destructive. The difference wasn't a better model. It was a better setup β a CLAUDE.md
, a set of focused subagents, a couple of MCP servers, and two small hooks.
Here is exactly how that setup works, so you can build your own.
CLAUDE.md
is the file Claude Code reads automatically at the start of every session. Most people treat it like a README. It works far better as a contract: short, specific, and written in always/never terms the agent can't misread.
The two rules that gave me the biggest jump in quality:
## Build & test (run these, do not guess)
- Install: pnpm install
- Dev: pnpm dev
- Test: pnpm test -- --run
- Lint: pnpm lint
- Typecheck: pnpm typecheck
## Always
- Run `pnpm typecheck && pnpm test -- --run` before saying a task is done.
- Use parameterized queries. Never build SQL with string concatenation.
- When you change a public function, update its tests in the same edit.
## Never
- Never run `git push`, `git reset --hard`, or any DB migration without me confirming.
- Never add a dependency to fix something a few lines of code can solve.
- Never disable a failing test to make the suite pass.
The reason the "Build & test" block matters: without the exact commands, the agent guesses them (npm test
, yarn test
, make test
) and wastes turns failing. Give it the real commands once and it stops guessing.
Keep this file under ~50 lines. A long CLAUDE.md
gets diluted in the context and the agent starts ignoring the middle of it. Be ruthless.
A single agent juggling "review this, also write tests, also refactor" produces mediocre output at every task. Subagents fix this by giving each job its own focused instructions and its own context window. In Claude Code these live in .claude/agents/*.md
and you invoke them by name.
A subagent is just a Markdown file with frontmatter and a system prompt. Here's a real one β my code-reviewer
:
---
name: code-reviewer
description: "Reviews a diff for bugs, security issues, and convention violations. Use after writing or changing code."
tools: Read, Grep, Bash
---
You are a senior code reviewer. Review ONLY the changed lines plus their immediate context.
Output exactly three sections:
1. **Blocking** β bugs, security holes, data loss risks. Each with file:line and a fix.
2. **Should fix** β correctness or clarity issues that aren't blocking.
3. **Nits** β style/naming. Keep this short.
Rules:
- If there are no blocking issues, say so on line one. Do not invent problems to look thorough.
- Check: injection, missing error handling, N+1 queries, unhandled null, race conditions.
- Quote the exact line you're flagging. No vague "consider improving error handling".
The two lines that make it useful are "Do not invent problems to look thorough" and "Quote the exact line." Without them, reviewers either rubber-stamp everything or generate a wall of generic advice. With them, you get actionable output.
I run a roster of focused specialists this way β a debugger
that reproduces before it theorizes, a test-writer
that targets edge cases instead of happy paths, a security-auditor
that thinks in OWASP categories, an api-designer
, a refactor-architect
, and so on. Each one is a small file with a tight output format. The pattern is always the same: one job, one output format, one anti-fluff constraint.
MCP (Model Context Protocol) servers are how Claude Code reaches outside the chat β the filesystem, a browser, a persistent memory store. You configure them in .mcp.json
:
{
"mcpServers": {
"filesystem": {
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-filesystem", "./"]
},
"sequential-thinking": {
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-sequential-thinking"]
}
}
}
The one I'd add first is sequential-thinking. It gives the agent a scratchpad to break a problem into steps before acting, which noticeably reduces the "confidently wrong" answers on multi-step tasks. A browser MCP (Playwright) is the second I reach for, because it lets the agent actually load the page and read the console instead of guessing why the UI broke.
One caution: MCP servers move fast and some get abandoned. Pin to maintained ones and check they still install before you rely on them.
Hooks are shell commands Claude Code runs on events you choose. Two are worth setting up on day one.
Auto-format on save β so you never review a diff full of whitespace noise:
{
"hooks": {
"PostToolUse": [
{
"matcher": "Edit|Write",
"hooks": [{ "type": "command", "command": "pnpm prettier --write \"$CLAUDE_FILE_PATHS\"" }]
}
]
}
}
Block dangerous commands β a pre-execution guard that refuses destructive shell calls even if the agent tries them:
#!/usr/bin/env bash
if echo "$CLAUDE_TOOL_INPUT" | grep -qE 'rm -rf /|git reset --hard|DROP TABLE'; then
echo "Blocked: destructive command refused by guard hook." >&2
exit 1
fi
This is the difference between trusting an autonomous agent and babysitting it. The model can be wrong; the hook is deterministic.
With the setup above, my actual workflow on a new feature is short:
code-reviewer
on the diff, then test-writer
for the gaps it found.pnpm typecheck && pnpm test
runs because CLAUDE.md
says it must.The agent does the typing. I do the deciding. That's the whole point.
Building all of this from scratch is a real time sink β the 26 subagents alone took me weeks of tuning to get the output formats right. If you'd rather skip that and drop a finished setup into any repo, I packaged my exact configuration as ** Claude Code Agent OS**: 26 specialized subagents, 12 slash commands (
/plan
, /review
, /test
, /ship
...), 5 ready-to-edit CLAUDE.md
templates, the MCP configs, and the safety + format hooks (bash and PowerShell) β with a START HERE quickstart that gets you running in about 5 minutes.But you don't need it to get value from this post. Start with a tight CLAUDE.md
and one code-reviewer
subagent today; that pair alone will change how your next commit feels.
If you build your own subagent roster, I'd genuinely like to see which specialists you find most useful β drop them in the comments.