{"slug": "compass-guardrails-and-a-hard-budget-cap-for-ai-coding-agents", "title": "Compass – guardrails and a hard budget cap for AI coding agents", "summary": "Compass, a local-first configuration layer for AI coding agents, introduces guardrails, cost routing, and a hard budget cap to prevent runaway spending and unsafe actions. The tool enforces a dollar limit that halts sessions before the next tool call, blocks catastrophic commands, and provides a measured guardrail policy scored 100/100 in CI. It aims to give developers verifiable control over AI agent behavior.", "body_md": "`budget gate`\n\n· `guardrails 100/100`\n\n· `~61% cheaper routing`\n\n· `signed releases`\n\n· `100% local`\n\n· `no telemetry`\n\n· `you always merge`\n\n**Real session, no edits:** the cost climbs to $0.35, then the next action is **HALTED** at the $0.05 cap — before it spends more.\n\n**compass is a local-first config layer for Claude Code, Codex & Gemini that stops your agent from doing three things it shouldn't** — burning your budget, running unsafe commands, and merging unverified code. Set `COMPASS_MAX_USD=5`\n\nand the session hard-stops at the cap; catastrophic commands are blocked before they run, and the guardrail policy is scored **100/100** in CI — not asserted. You install it once, and **you always merge.**\n\n```\n# no curl|sh, fully reversible — then just open any repo in your agent\ngit clone https://github.com/dshakes/compass ~/compass && cd ~/compass && ./quickstart.sh\n# or, inside Claude Code:   /plugin marketplace add dshakes/compass\n```\n\n[▶ See it work](#see-it-work) · [Why it's different](#why-its-different--measured-not-vibes) · [The self-fixing PR loop](#-the-part-people-screenshot-it-fixes-its-own-prs) · ** Install** ·\n\n[What's in the box](#whats-in-the-box)·\n\n[📚 Docs](/dshakes/compass/blob/main/docs/11-using-compass.md)\n\nOpen a pull request and compass **reviews it, security-checks it, runs the tests, cross-audits it with a second model — then pushes its own fixes until it's green.** You just merge.\n\n**The idea in one line: the loop is the unit of work.** A one-shot agent stops at its first wrong answer. compass *loops* — **generate → test → critique → fix → repeat against a gate** — so quality comes from iteration, not one lucky prompt. The same closed loop runs a single PR, or your whole fleet of repos overnight. *(Try it locally in 30s, no tokens — watch it ↓.)*\n\nEvery AI-agent config claims \"safe\" and \"cheap.\" compass is the one that hands you the **number** — and lets a skeptic reproduce it in 30 seconds. Everyone has the same models; the edge is *configuration you can trust*, not another feature list. Four claims, four commands:\n\n**🛡 Guardrails with a score.** Catastrophic commands and secret writes are blocked *before they run* — and the policy is eval-gated, not asserted. *(In human terms: it won't let the agent delete your machine or leak your keys, and it can prove how well.)*\n\n```\ncompass bench     # → guardrail 100% precision/recall (61-case corpus), router 96.9% — in CI\n# then ask the agent to `rm -rf /` or write a .env → denied; `rm -rf ./build` → allowed\n```\n\n**📉 Cost routing that's measured.** Cheap work goes to cheap models — scored against an eval set, ~61% cheaper than all-Opus at ~98% quality on a fair mix. *(In human terms: it stops paying Opus prices to fix a typo.)*\n\n```\ncompass route \"redesign the auth model\"   # → opus\ncompass route \"fix a typo\"                 # → haiku\n```\n\n**💸 A budget ceiling that actually stops it.** Usage trackers *report* spend; compass *enforces* it — set a dollar cap and the session is **halted before the next tool call** once it's reached, live. *(In human terms: an agent can't quietly run up a $40 bill while you're away — it stops at your number.)*\n\n```\nexport COMPASS_MAX_USD=5     # this session hard-stops at $5 — the agent is blocked, not just warned\ncompass spend --max-usd 5    # the same ceiling on the ledger, for scheduled / fleet runs\n```\n\n**🔏 Supply chain you can verify.** Releases carry keyless SLSA provenance, so a tampered or look-alike download is rejected. *(In human terms: you can prove the code you installed is the code I shipped.)*\n\n```\ncompass verify v0.17.2     # → ✓ provenance verified\n```\n\n**🧪 Red-team resistance, measured.** Prompt-injection (direct/indirect/paste), CLAUDE.md poisoning, local safety-override, malware & insecure-code — scored against a labeled corpus that gates in CI, with optional escalation to a managed guardrails service (webhook · Bedrock · Azure). *(In human terms: a poisoned repo or web page can't quietly turn your agent against you.)*\n\n```\ncompass redteam   # → injection corpus 100% P/R, then scans THIS repo's CLAUDE.md/MCP/settings\n```\n\nNo service, no telemetry, no `--dangerously-skip-permissions`\n\n; `git pull`\n\nto update. The work it can't safely own, it hands back — **you keep the merge.**\n\nSmallest leap of faith first — **the governance moment**, then **feel it**, then **see the proof**, then **see how it works.**\n\n**0 · The budget ceiling, annotated** — the same hard-stop as the hero clip, as a clean walkthrough ($1.80 ✓ → $4.10 ✓ → $5.00 HALTED). Usage trackers report spend; compass enforces it:\n\n**1 · The day-to-day feel** — guardrails, the cost-aware status line, the loop, and the crew, in ~25 seconds:\n\n**2 · The headline, on a real PR** — a Blocking bug and red tests, and it pushes its *own* fix until the PR is green (then waits for you):\n\n**3 · How that loop works** — review · security · tests · Codex cross-audit run in parallel; Blocking findings get auto-fixed and re-reviewed (round-capped) until green, then it stops at you:\n\nRun it locally in 30s with ~/compass/sdlc/orchestrate.sh \"<task>\" (no tokens), or wire the GitHub loop for every PR. → how it works · reproduce it\n\nAnd the everyday status line quietly keeps score, so you watch it earn its keep:\n\n```\nOpus 4.8 · myrepo · main* · 45k ctx · $1.23 · 🧭 🛡1 🧹2 💡1 📉~$1.65\n```\n\nsession spend, then today's compass activity: 🛡 footguns blocked · 🧹 files formatted · 💡 policy nudges · 📉~$ estimated saved vs all-Opus. Each piece shows only when there's something to report; nothing leaves your machine.\n\nAutonomy here isn't one big magic button — it's the *same closed loop* applied at four scales. Each runs until a gate says \"done,\" then stops at a human. That's the whole trick: **iteration under a gate beats a single confident guess.**\n\n| Loop | What it drives | Where it stops | |\n|---|---|---|---|\n| 🔁 | The task loop |\ngenerate → test → critique → fix → repeat — one change driven to green | when tests + review pass |\n| 🔎 | The review loop |\nreview → auto-fix the Blocking findings → re-review, round-capped (×3) | hands off to a human if still red |\n| 🛰️ | The fleet loop |\nthe whole pipeline, scheduled across every repo you own, overnight, test-gated |\na PR per repo, approve from your phone |\n| 👥 | The workflow loops |\nparallel agents that fan out, fact-check each other, and converge | one synthesized answer |\n\nEvery loop ends the same way — **you merge.** That gate never moves.\n\n| You want… | You need | Tokens? |\n|---|---|---|\nThe config, guardrails, CLI, subagents (local) |\nClaude Code (or Codex/Gemini) + `git` |\nNone |\nThe autonomous PR loop (GitHub Actions) |\nA repo with Actions + `gh` , model auth (`CLAUDE_CODE_OAUTH_TOKEN` or `ANTHROPIC_API_KEY` ), and `SDLC_BOT_TOKEN` (fine-grained PAT) so the loop can chain |\nYes |\nKeyless loop (self-hosted runner) |\nA runner labeled `compass` + `SDLC_BOT_TOKEN` |\nPAT only |\nThe fleet (every repo) |\n`FLEET_TOKEN` + `FLEET_MAINTAINER` |\nYes |\n\nOne command wires the GitHub loop: `~/compass/sdlc/setup.sh --all`\n\n(labels + workflows + CODEOWNERS + secrets + branch protection). Without `SDLC_BOT_TOKEN`\n\nthe loop still runs — it just won't auto-re-fire after a fix. → [full SDLC setup](/dshakes/compass/blob/main/docs/09-sdlc.md)\n\n**Pick the door that fits — all reversible, version-pinnable, no curl | sh.** You need an AI assistant (\n\n[Claude Code](https://code.claude.com); Codex/Gemini optional) +\n\n`git`\n\n. No API keys to get the manual, guardrails, crew, and CLI.**🍺 Homebrew** — managed & versioned\n\n```\nbrew tap dshakes/compass https://github.com/dshakes/compass\nbrew install dshakes/compass/compass     # latest release · --HEAD to track main\ncompass quickstart                       # previews, asks, then wires it into ~/.claude\n```\n\n**📦 Git clone** — own & edit your config *(recommended)*\n\n```\ngit clone https://github.com/dshakes/compass ~/compass && cd ~/compass\ngit checkout v0.17.2     # optional: pin to a release instead of main\n./quickstart.sh          # previews every change, asks first, fully reversible\n```\n\n**🧩 Claude Code plugin** — no terminal *(ideal for a team)*\n\n```\n/plugin marketplace add dshakes/compass\n/plugin install core@compass\n```\n\n**🛠️ By hand:** `make dry-run`\n\n(preview) → `make install`\n\n→ `make doctor`\n\n. Symlink install means `git pull`\n\n/`brew upgrade`\n\nupdates everything; `make uninstall`\n\nremoves only what it added. → [Team rollout](/dshakes/compass/blob/main/docs/05-plugin.md)\n\nFor *every* kind of user: a one-line marketplace/extension install (no terminal), or `make install`\n\nif you'd rather own the files. Same operating manual + MCP servers, the way each tool expects them:\n\n| Agent | Native install (no terminal) | or own the files |\n|---|---|---|\nClaude Code |\n`/plugin marketplace add dshakes/compass` → `/plugin install core@compass` |\n`make install` |\nCodex |\n`codex plugin marketplace add dshakes/compass` → `/plugin install` |\n`make install` (`~/.codex/AGENTS.md` + `config.toml` ) |\nGemini CLI |\n`gemini extensions install https://github.com/dshakes/compass` |\n`./install.sh --gemini` (`~/.gemini/GEMINI.md` ) |\nCursor · Copilot · OpenCode · Windsurf |\nread the repo's `AGENTS.md` (\n|\nclone + `make install` |\n\n`CLAUDE.md`\n\n· `AGENTS.md`\n\n· `GEMINI.md`\n\nare **one file** (symlinks), and the Claude/Codex plugin manifests + Gemini extension are generated from one source and CI-checked (`scripts/check-vendor.sh`\n\n) — so a `git pull`\n\nupdates every agent at once and a manifest can't drift.\n\nThe marketplace/extension manifests match each vendor's documented schema and are structure-validated in CI. The live install is\n\nmanually verified—`gemini extensions install`\n\n(gemini 0.26.0) and`codex plugin marketplace add`\n\n(codex 0.130.0) both succeed against this repo — but isn't run in our CI (those CLIs aren't in the runner).\n\n```\ncompass doctor      # validate the install — expect \"0 error\"\ncompass status      # is compass active here, and what's loaded?\n```\n\nThen just open Claude Code as usual — the manual, guardrails, subagents, commands, and status line are already loaded. Feel it in a minute: ask for a dangerous command (blocked), run `/review`\n\non your diff, or `compass route \"<task>\"`\n\nto see the tier it picks. No tokens, no signup for any of it.\n\nEverything below is **on after one install** or a single opt-in — the autonomous loops above sit on top of this. The README sells; the docs explain — each row links to the detail.\n\n| Capability | One line | Deep dive | |\n|---|---|---|---|\n| 🔁 | Autonomous SDLC |\nthe review → security → tests → Codex audit → auto-fix → re-review loop; you merge |\n|\n\n**The fleet*** all*your repos through a test gate; approve from your phone[14-fleet](/dshakes/compass/blob/main/docs/14-fleet.md)**The crew + workflows**[12](/dshakes/compass/blob/main/docs/12-every-agent.md)·[13](/dshakes/compass/blob/main/docs/13-workflows.md)** Guardrails & scanning**`compass scan`\n\n), auto-format, keep a JSONL audit log[16-hardening](/dshakes/compass/blob/main/docs/16-hardening-and-frontier.md)**Red-team hardening**[17-red-team](/dshakes/compass/blob/main/docs/17-red-team.md)** Cost-tier router**[router/](/dshakes/compass/blob/main/router)** The compass CLI**`onboard · impact · drift · scan · redteam · sandbox · verify · audit-log · spend · dashboard`\n\n[11-using](/dshakes/compass/blob/main/docs/11-using-compass.md)**MCP + LSP****version-pinned** MCP servers (context7 · fetch · git) + opt-in language-server intelligence[04](/dshakes/compass/blob/main/docs/04-mcp.md)·[06](/dshakes/compass/blob/main/docs/06-lsp.md)**Every agent, one source**[standard](https://agents.md/)`AGENTS.md`\n\n[12-every-agent](/dshakes/compass/blob/main/docs/12-every-agent.md)**Live budget ceiling**`COMPASS_MAX_USD`\n\n) — enforced, not just reported[02-cost](/dshakes/compass/blob/main/docs/02-cost-and-models.md)**Cost discipline**`compass spend`\n\n/`impact`\n\nto see the $[02-cost](/dshakes/compass/blob/main/docs/02-cost-and-models.md)Built to be **trusted before it's run** — and honest about its limits.\n\n**You own the irreversible.** Agents prepare; humans push, merge, deploy. Required checks + a code-owner approval enforce it — there's no \"merge to prod\" button.**Readable & reversible.** No`curl | sh`\n\n. The installer backs up what it replaces, is idempotent, and`make uninstall`\n\nremoves only what it added. Pin a tag, not`main`\n\n.**Guardrails reduce footguns; they are not a security boundary.** Keep least-privilege credentials and review your diffs. (For untrusted code,`compass sandbox`\n\nis a real boundary.)**Red-team hardening is defense-in-depth, not immunity.** It warns on prompt-injection (direct/indirect/paste), CLAUDE.md poisoning, and local safety-override, and refuses to grant project-level safety exceptions — but the cardinal rule (external content is data, not instructions) and the human gate are what actually hold.`compass redteam`\n\nmeasures it; see.`docs/17-red-team.md`\n\n**What talks to the network.** compass phones home to nothing. The auto-registered MCP servers reach non-Anthropic endpoints —`context7`\n\n→ Upstash (library docs),`fetch`\n\n→ URLs you request;`git`\n\nis local. Hooks are short, commented shell scripts in`claude/hooks/`\n\n; disable any via`claude/settings.json`\n\n.**Grounded, not invented.** Every capability maps to a documented Claude Code / Codex primitive — cited in.`docs/07-practices.md`\n\nStatus: alpha.The core — manual, hooks, subagents, commands, MCP, plugin — is stable and dogfooded daily. TheSDLC pipelineis newer: its logic is statically validated in CI and exercised via a smoke-test checklist you run on your own repo — treat it as early. Thered-team layeris new: its detectors are eval-gated in CI (precision/recall on a labeled corpus) and resist obfuscation (`compass redteam --attack`\n\n), but pattern detection is best-effort defense-in-depth, not immunity — and the managed-guardrail adapters are response-parsing contract-tested, with thelive Bedrock/Azure calls unverified in CI(need your creds) and no live third-party benchmark scores (see[docs/17]).Dynamic workflowsare a Claude Code research preview. The human merge/deploy gate is permanent, by design.\n\n** Start here → Using compass** — install, the pieces in plain language, the daily workflow.\n\n[Philosophy](/dshakes/compass/blob/main/docs/00-philosophy.md) · [Architecture](/dshakes/compass/blob/main/docs/01-architecture.md) · [Cost & models](/dshakes/compass/blob/main/docs/02-cost-and-models.md) · [Customize](/dshakes/compass/blob/main/docs/03-customize.md) · [MCP](/dshakes/compass/blob/main/docs/04-mcp.md) · [Plugin & team rollout](/dshakes/compass/blob/main/docs/05-plugin.md) · [LSP](/dshakes/compass/blob/main/docs/06-lsp.md) · [Practices](/dshakes/compass/blob/main/docs/07-practices.md) · [Defaults](/dshakes/compass/blob/main/docs/08-defaults.md) · [SDLC](/dshakes/compass/blob/main/docs/09-sdlc.md) · [Roadmap](/dshakes/compass/blob/main/docs/10-roadmap.md) · [Every agent](/dshakes/compass/blob/main/docs/12-every-agent.md) · [Dynamic workflows](/dshakes/compass/blob/main/docs/13-workflows.md) · [Fleet](/dshakes/compass/blob/main/docs/14-fleet.md) · [Competitive audit](/dshakes/compass/blob/main/docs/15-competitive-audit.md) · [Hardening + frontier](/dshakes/compass/blob/main/docs/16-hardening-and-frontier.md) · [Red-team](/dshakes/compass/blob/main/docs/17-red-team.md) · [Open benchmark](/dshakes/compass/blob/main/docs/18-benchmark.md) · [Provenance](/dshakes/compass/blob/main/docs/19-provenance.md) · [Router module](/dshakes/compass/blob/main/router/README.md) · [ADRs](/dshakes/compass/blob/main/docs/adr)\n\nMIT · built to be shared · contributions welcome", "url": "https://wpnews.pro/news/compass-guardrails-and-a-hard-budget-cap-for-ai-coding-agents", "canonical_source": "https://github.com/dshakes/compass", "published_at": "2026-06-21 22:38:35+00:00", "updated_at": "2026-06-21 22:55:58.609388+00:00", "lang": "en", "topics": ["ai-agents", "ai-safety", "developer-tools", "ai-tools", "ai-infrastructure"], "entities": ["Compass", "Claude Code", "Codex", "Gemini", "GitHub"], "alternates": {"html": "https://wpnews.pro/news/compass-guardrails-and-a-hard-budget-cap-for-ai-coding-agents", "markdown": "https://wpnews.pro/news/compass-guardrails-and-a-hard-budget-cap-for-ai-coding-agents.md", "text": "https://wpnews.pro/news/compass-guardrails-and-a-hard-budget-cap-for-ai-coding-agents.txt", "jsonld": "https://wpnews.pro/news/compass-guardrails-and-a-hard-budget-cap-for-ai-coding-agents.jsonld"}}