{"slug": "cerberus-a-local-firewall-for-ai-agents-tool-calls", "title": "Cerberus – a local firewall for AI agents' tool calls", "summary": "Cerberus, a local-first security gateway for AI coding agents, intercepts every tool call before execution, risk-scores it across four signals, and allows, audits, asks for human approval, or blocks it—all on the user's machine with no external API. The tool addresses the risk of autonomous agents running shell commands, editing files, or making network calls without human oversight, preventing actions like secret exfiltration, excessive permissions, dangerous egress, and tool abuse.", "body_md": "A **local-first security gateway for autonomous AI coding agents.** Cerberus sits between the agent\n(Claude Code, Codex, Cursor, Cline) and your machine, intercepts **every tool call** before it runs,\nrisk-scores it across four signals, and either **allows, audits, asks for human approval, or blocks**\nit — all on your machine, with **no external API and nothing leaving the box.**\n\nAutonomous coding agents run shell commands, edit files, and make network calls on your behalf — at\nmachine speed, often unattended. One bad step (`rm -rf`\n\n, an unwanted `git push`\n\n, a leaked `.env`\n\n, a\npoisoned README that tricks the agent into exfiltrating secrets) and there's no human in the loop to\nstop it. Cerberus puts that checkpoint **on the tool boundary**, where the agent actually acts.\n\n```\nPreToolUse  ─▶ intercept ─▶ Policy + Behavioral + Content + Injection ─▶ Risk Engine ─▶ ALLOW · AUDIT · HITL · BLOCK\nPostToolUse ─▶ inspect   ─▶ secret + injection detection ─▶ session contamination state\n```\n\nFour deterministic signals aggregated into one weighted risk score, with a hard floor that absolute prohibitions can never override.\n\n**🟢 Secret exfiltration**— detects secrets loaded into context, then** content-matches the outbound payload**: holds the call that actually carries the key (raw or base64/hex/url-encoded), with provenance (`source: .env:4 · sha256:… · 97%`\n\n) and never logging the secret itself.**🟢 Excessive permissions**— every call gated; unknown tools fail-closed; sensitive paths (`~/.ssh`\n\n,`~/.aws`\n\n, credentials,`/etc/passwd`\n\n) held; destructive commands (`rm -rf`\n\n,`Remove-Item -Recurse`\n\n,`chmod 777`\n\n,`kill -9`\n\n) blocked or held.**🟢 Dangerous egress**— destination policy: trusted hosts (registries, GitHub, OpenAI/Anthropic) auto-allowed; paste sites / webhook catchers / raw-IP destinations held.**🟡 Tool abuse**— runaway-loop and tool-call-rate/repetition detection.**🟡 Prompt injection**— detects injection in tool*results*and gates the next egress (heuristic classifier; optional local DeBERTa model). It sees tool calls,**not the LLM prompt**— so it catches the*exploitation*of an injection (the egress), not the injection itself.\n\n**Terminal-first approval**— held calls surface in the agent's native permission prompt (Claude Code / Cursor), or via`cerberus approve <id>`\n\n/ a localhost dashboard.**Forensic dashboard**— per-session timeline, risk-factor breakdown, and a** Replay**player that steps through how a session's risk built up.** Multi-agent**— one adapter layer serves Claude Code, Codex, Cursor, and Cline.** Policy as data**— rules and risk weights are editable YAML, not code.** Local-first**— binds to`127.0.0.1`\n\n, no external API, no telemetry; secret*values*never touch disk or logs.\n\n```\nnpm i -g @cerberussec/core      # or run ad-hoc with: npx @cerberussec/core <cmd>\n\n# wire Cerberus into your agent (merges into the agent's config — backed up, idempotent):\ncerberus init                 # Claude Code, project-level   (--agent codex|cursor|cline, --global, --print)\n\n# start the gateway + dashboard (one process):\ncerberus engine               # then open http://127.0.0.1:9000/\n```\n\nUse your agent as usual — tool calls now route through Cerberus. By default a held (HITL) call is\n**approved right in the terminal**: Cerberus returns `ask`\n\n, so Claude Code shows its native\npermission prompt with Cerberus's reason — approve/deny without leaving your session.\n\nThe dashboard (`http://127.0.0.1:9000/`\n\n) has a **Live** tab (Action Center + stream) and a\n**Sessions** tab — a forensic timeline per session with a risk-factor breakdown and a **Replay**\nplayer to step through how a session's risk built up.\n\nCerberus runs *inside* the agent's execution loop, so the terminal is the realtime decision point\nand the dashboard is the deep dive. Per severity (default `AG_APPROVAL_SURFACE=terminal`\n\n):\n\n| verdict | terminal | web UI |\n|---|---|---|\nBLOCK |\n⛔ denied in-terminal (Claude shows the reason) + optional auto-open | forensics |\nHITL |\n✋ Claude's native permission prompt, with Cerberus's reason |\nforensics |\nAUDIT |\n— (quiet) | elevated-risk record |\nALLOW |\n— (silent) | — |\n\nPrefer a central web queue instead? Set ** AG_APPROVAL_SURFACE=dashboard** — held calls then pause on\nthe engine's synchronous hold and you Approve/Deny from the dashboard (or the terminal, out-of-band):\n\n```\ncerberus pending              # list calls held for review (with their ids)\ncerberus approve <id>         # release a held call …\ncerberus deny <id>            # … or deny it\n```\n\nExtra terminal alerts write to the controlling terminal (`/dev/tty`\n\n, falling back to stderr) so the\nprotocol channel to Claude Code stays clean. Tune via env:\n\n| env | default | effect |\n|---|---|---|\n`AG_NOTIFY` |\n`1` |\nextra terminal alert lines on/off (`0` to silence) |\n`AG_APPROVAL_SURFACE` |\n`terminal` |\n`terminal` ⇒ HITL via Claude's native prompt; `dashboard` ⇒ socket hold + dashboard approve |\n`AG_AUTO_OPEN` |\n`off` |\n`block` ⇒ auto-open the investigation UI on a BLOCK/EXFIL |\n\nThe engine + signals + risk + dashboard are agent-agnostic; only a thin **adapter** (parse the agent's\nhook event → normalize → emit its verdict shape) is per-agent. Wire one with `cerberus init --agent <name>`\n\n:\n\n| agent | `--agent` |\nHITL approval | notes |\n|---|---|---|---|\nClaude Code |\n`claude` (default) |\nnative terminal prompt (`ask` ) |\nverified end-to-end |\nCodex CLI |\n`codex` |\ndashboard hold (no native ask) — `AG_APPROVAL_SURFACE=dashboard` |\nenterprise `requirements.toml` makes it non-bypassable |\nCursor |\n`cursor` |\nnative IDE prompt (`ask` ) |\ninit sets `failClosed: true` |\nCline |\n`cline` |\ndashboard hold (`cancel` bool) |\nmacOS/Linux only |\n\n`codex`\n\n/`cursor`\n\n/`cline`\n\nadapters follow the published hook specs; verify against your installed version\n(`cerberus init --agent <name> --print`\n\nshows the exact config). Roo Code is unsupported (archived 2026).\n\n**PreToolUse hook →** is the single hard enforcement point (allow/deny/ask; or HITL holds the socket open until you decide).`/intercept`\n\n**PostToolUse hook →** is observe-only: it updates the session's contamination state so the`/inspect`\n\n*next*action is judged with full context. It never modifies a tool result.- The engine is\n**agent-agnostic** at its core; per-agent adapters (`--agent`\n\n) are the only thing that differs.\n\n```\nPreToolUse  ─▶ /intercept ─▶ Policy + Behavioral + Content/Injection ─▶ RiskEngine ─▶ ALLOW/AUDIT/HITL/BLOCK\nPostToolUse ─▶ /inspect   ─▶ secret detection + injection classifier ─▶ session contamination state\n                                                                   (audit log + WebSocket → dashboard)\n```\n\nSingle Node + TypeScript package; the dashboard is a Vite/React app served by the engine. Rules and\nrisk weights are editable **YAML data**, not code (`rules/`\n\n).\n\nCerberus is a **runtime gateway on the tool boundary**. It's strongest at secret-exfiltration\nprevention and as a permission chokepoint. Because it sees tool calls (not the LLM prompt), it catches\nthe *exploitation* of a prompt injection — not the injection itself — and it does **not** cover\ndata-pipeline / RAG poisoning. The exfil match is high-confidence but not airtight (novel secret formats,\nsplit-across-calls encoding). Honest defaults over false guarantees.\n\nNo external API, no API key, nothing leaves the machine. The optional injection model\n([ @cerberussec/injection-model](/Adirdabush1/cerberus/blob/main/packages/injection-model), ProtectAI DeBERTa, Apache-2.0) upgrades\nthe built-in heuristic classifier; install it only if you want it. The core is OSS-clean\n(Apache/MIT-compatible deps); Meta Prompt-Guard is deliberately kept out of core (Llama license).\n\n```\n# from a clone: install (root + dashboard are separate npm projects) and build\nnpm install && npm --prefix dashboard install\nnpm run build             # compile the engine (tsc → dist) + dashboard (vite → dashboard/dist)\n\nnpm run engine            # run from source via tsx (dev)\nnpm run typecheck\nnpm run test:behavioral && npm run test:content && npm run test:injection && npm run test:risk \\\n  && npm run test:init && npm run test:projector && npm run test:audit && npm run test:notify \\\n  && npm run test:security && npm run test:policy && npm run test:adapters\nnpm run e2e:behavioral && npm run e2e:content && npm run e2e:injection && npm run e2e:risk\n```\n\nSee `PLAN.md`\n\nfor milestones and `brainstorms/`\n\nfor the design records behind each decision.", "url": "https://wpnews.pro/news/cerberus-a-local-firewall-for-ai-agents-tool-calls", "canonical_source": "https://github.com/Adirdabush1/cerberus", "published_at": "2026-06-28 04:49:30+00:00", "updated_at": "2026-06-28 05:04:36.522071+00:00", "lang": "en", "topics": ["ai-agents", "ai-safety", "ai-tools", "ai-infrastructure", "developer-tools"], "entities": ["Cerberus", "Claude Code", "Codex", "Cursor", "Cline", "OpenAI", "Anthropic", "GitHub"], "alternates": {"html": "https://wpnews.pro/news/cerberus-a-local-firewall-for-ai-agents-tool-calls", "markdown": "https://wpnews.pro/news/cerberus-a-local-firewall-for-ai-agents-tool-calls.md", "text": "https://wpnews.pro/news/cerberus-a-local-firewall-for-ai-agents-tool-calls.txt", "jsonld": "https://wpnews.pro/news/cerberus-a-local-firewall-for-ai-agents-tool-calls.jsonld"}}