Cerberus – a local firewall for AI agents' tool calls Cerberus, a local-first security gateway for AI coding agents, intercepts every tool call before execution, risk-scores it across four signals, and allows, audits, asks for human approval, or blocks it—all on the user's machine with no external API. The tool addresses the risk of autonomous agents running shell commands, editing files, or making network calls without human oversight, preventing actions like secret exfiltration, excessive permissions, dangerous egress, and tool abuse. A local-first security gateway for autonomous AI coding agents. Cerberus sits between the agent Claude Code, Codex, Cursor, Cline and your machine, intercepts every tool call before it runs, risk-scores it across four signals, and either allows, audits, asks for human approval, or blocks it — all on your machine, with no external API and nothing leaving the box. Autonomous coding agents run shell commands, edit files, and make network calls on your behalf — at machine speed, often unattended. One bad step rm -rf , an unwanted git push , a leaked .env , a poisoned README that tricks the agent into exfiltrating secrets and there's no human in the loop to stop it. Cerberus puts that checkpoint on the tool boundary , where the agent actually acts. PreToolUse ─▶ intercept ─▶ Policy + Behavioral + Content + Injection ─▶ Risk Engine ─▶ ALLOW · AUDIT · HITL · BLOCK PostToolUse ─▶ inspect ─▶ secret + injection detection ─▶ session contamination state Four deterministic signals aggregated into one weighted risk score, with a hard floor that absolute prohibitions can never override. 🟢 Secret exfiltration — detects secrets loaded into context, then content-matches the outbound payload : holds the call that actually carries the key raw or base64/hex/url-encoded , with provenance source: .env:4 · sha256:… · 97% and never logging the secret itself. 🟢 Excessive permissions — every call gated; unknown tools fail-closed; sensitive paths ~/.ssh , ~/.aws , credentials, /etc/passwd held; destructive commands rm -rf , Remove-Item -Recurse , chmod 777 , kill -9 blocked or held. 🟢 Dangerous egress — destination policy: trusted hosts registries, GitHub, OpenAI/Anthropic auto-allowed; paste sites / webhook catchers / raw-IP destinations held. 🟡 Tool abuse — runaway-loop and tool-call-rate/repetition detection. 🟡 Prompt injection — detects injection in tool results and gates the next egress heuristic classifier; optional local DeBERTa model . It sees tool calls, not the LLM prompt — so it catches the exploitation of an injection the egress , not the injection itself. Terminal-first approval — held calls surface in the agent's native permission prompt Claude Code / Cursor , or via cerberus approve