cd /news/ai-agents/compass-guardrails-and-a-hard-budget… · home topics ai-agents article
[ARTICLE · art-35928] src=github.com ↗ pub= topic=ai-agents verified=true sentiment=↑ positive

Compass – guardrails and a hard budget cap for AI coding agents

Compass, a local-first configuration layer for AI coding agents, introduces guardrails, cost routing, and a hard budget cap to prevent runaway spending and unsafe actions. The tool enforces a dollar limit that halts sessions before the next tool call, blocks catastrophic commands, and provides a measured guardrail policy scored 100/100 in CI. It aims to give developers verifiable control over AI agent behavior.

read11 min views1 publishedJun 21, 2026
Compass – guardrails and a hard budget cap for AI coding agents
Image: source

budget gate

· guardrails 100/100

· ~61% cheaper routing

· signed releases

· 100% local

· no telemetry

· you always merge

Real session, no edits: the cost climbs to $0.35, then the next action is HALTED at the $0.05 cap — before it spends more.

compass is a local-first config layer for Claude Code, Codex & Gemini that stops your agent from doing three things it shouldn't — burning your budget, running unsafe commands, and merging unverified code. Set COMPASS_MAX_USD=5

and the session hard-stops at the cap; catastrophic commands are blocked before they run, and the guardrail policy is scored 100/100 in CI — not asserted. You install it once, and you always merge.

git clone https://github.com/dshakes/compass ~/compass && cd ~/compass && ./quickstart.sh

▶ See it work · Why it's different · The self-fixing PR loop · ** Install** ·

What's in the box·

📚 Docs

Open a pull request and compass reviews it, security-checks it, runs the tests, cross-audits it with a second model — then pushes its own fixes until it's green. You just merge.

The idea in one line: the loop is the unit of work. A one-shot agent stops at its first wrong answer. compass loopsgenerate → test → critique → fix → repeat against a gate — so quality comes from iteration, not one lucky prompt. The same closed loop runs a single PR, or your whole fleet of repos overnight. (Try it locally in 30s, no tokens — watch it ↓.)

Every AI-agent config claims "safe" and "cheap." compass is the one that hands you the number — and lets a skeptic reproduce it in 30 seconds. Everyone has the same models; the edge is configuration you can trust, not another feature list. Four claims, four commands:

🛡 Guardrails with a score. Catastrophic commands and secret writes are blocked before they run — and the policy is eval-gated, not asserted. (In human terms: it won't let the agent delete your machine or leak your keys, and it can prove how well.)

compass bench     # → guardrail 100% precision/recall (61-case corpus), router 96.9% — in CI

📉 Cost routing that's measured. Cheap work goes to cheap models — scored against an eval set, ~61% cheaper than all-Opus at ~98% quality on a fair mix. (In human terms: it stops paying Opus prices to fix a typo.)

compass route "redesign the auth model"   # → opus
compass route "fix a typo"                 # → haiku

💸 A budget ceiling that actually stops it. Usage trackers report spend; compass enforces it — set a dollar cap and the session is halted before the next tool call once it's reached, live. (In human terms: an agent can't quietly run up a $40 bill while you're away — it stops at your number.)

export COMPASS_MAX_USD=5     # this session hard-stops at $5 — the agent is blocked, not just warned
compass spend --max-usd 5    # the same ceiling on the ledger, for scheduled / fleet runs

🔏 Supply chain you can verify. Releases carry keyless SLSA provenance, so a tampered or look-alike download is rejected. (In human terms: you can prove the code you installed is the code I shipped.)

compass verify v0.17.2     # → ✓ provenance verified

🧪 Red-team resistance, measured. Prompt-injection (direct/indirect/paste), CLAUDE.md poisoning, local safety-override, malware & insecure-code — scored against a labeled corpus that gates in CI, with optional escalation to a managed guardrails service (webhook · Bedrock · Azure). (In human terms: a poisoned repo or web page can't quietly turn your agent against you.)

compass redteam   # → injection corpus 100% P/R, then scans THIS repo's CLAUDE.md/MCP/settings

No service, no telemetry, no --dangerously-skip-permissions

; git pull

to update. The work it can't safely own, it hands back — you keep the merge.

Smallest leap of faith first — the governance moment, then feel it, then see the proof, then see how it works.

0 · The budget ceiling, annotated — the same hard-stop as the hero clip, as a clean walkthrough ($1.80 ✓ → $4.10 ✓ → $5.00 HALTED). Usage trackers report spend; compass enforces it:

1 · The day-to-day feel — guardrails, the cost-aware status line, the loop, and the crew, in ~25 seconds:

2 · The headline, on a real PR — a Blocking bug and red tests, and it pushes its own fix until the PR is green (then waits for you):

3 · How that loop works — review · security · tests · Codex cross-audit run in parallel; Blocking findings get auto-fixed and re-reviewed (round-capped) until green, then it stops at you:

Run it locally in 30s with ~/compass/sdlc/orchestrate.sh "<task>" (no tokens), or wire the GitHub loop for every PR. → how it works · reproduce it

And the everyday status line quietly keeps score, so you watch it earn its keep:

Opus 4.8 · myrepo · main* · 45k ctx · $1.23 · 🧭 🛡1 🧹2 💡1 📉~$1.65

session spend, then today's compass activity: 🛡 footguns blocked · 🧹 files formatted · 💡 policy nudges · 📉~$ estimated saved vs all-Opus. Each piece shows only when there's something to report; nothing leaves your machine.

Autonomy here isn't one big magic button — it's the same closed loop applied at four scales. Each runs until a gate says "done," then stops at a human. That's the whole trick: iteration under a gate beats a single confident guess.

Loop What it drives Where it stops
🔁 The task loop
generate → test → critique → fix → repeat — one change driven to green when tests + review pass
🔎 The review loop
review → auto-fix the Blocking findings → re-review, round-capped (×3) hands off to a human if still red
🛰️ The fleet loop
the whole pipeline, scheduled across every repo you own, overnight, test-gated
a PR per repo, approve from your phone
👥 The workflow loops
parallel agents that fan out, fact-check each other, and converge one synthesized answer

Every loop ends the same way — you merge. That gate never moves.

You want… You need Tokens?
The config, guardrails, CLI, subagents (local)
Claude Code (or Codex/Gemini) + git
None
The autonomous PR loop (GitHub Actions)
A repo with Actions + gh , model auth (CLAUDE_CODE_OAUTH_TOKEN or ANTHROPIC_API_KEY ), and SDLC_BOT_TOKEN (fine-grained PAT) so the loop can chain
Yes
Keyless loop (self-hosted runner)
A runner labeled compass + SDLC_BOT_TOKEN
PAT only
The fleet (every repo)
FLEET_TOKEN + FLEET_MAINTAINER
Yes

One command wires the GitHub loop: ~/compass/sdlc/setup.sh --all

(labels + workflows + CODEOWNERS + secrets + branch protection). Without SDLC_BOT_TOKEN

the loop still runs — it just won't auto-re-fire after a fix. → full SDLC setup

Pick the door that fits — all reversible, version-pinnable, no curl | sh. You need an AI assistant (

Claude Code; Codex/Gemini optional) +

git

. No API keys to get the manual, guardrails, crew, and CLI.🍺 Homebrew — managed & versioned

brew tap dshakes/compass https://github.com/dshakes/compass
brew install dshakes/compass/compass     # latest release · --HEAD to track main
compass quickstart                       # previews, asks, then wires it into ~/.claude

📦 Git clone — own & edit your config (recommended)

git clone https://github.com/dshakes/compass ~/compass && cd ~/compass
git checkout v0.17.2     # optional: pin to a release instead of main
./quickstart.sh          # previews every change, asks first, fully reversible

🧩 Claude Code plugin — no terminal (ideal for a team)

/plugin marketplace add dshakes/compass
/plugin install core@compass

🛠️ By hand: make dry-run

(preview) → make install

make doctor

. Symlink install means git pull

/brew upgrade

updates everything; make uninstall

removes only what it added. → Team rollout

For every kind of user: a one-line marketplace/extension install (no terminal), or make install

if you'd rather own the files. Same operating manual + MCP servers, the way each tool expects them:

Agent Native install (no terminal) or own the files
Claude Code
/plugin marketplace add dshakes/compass/plugin install core@compass
make install
Codex
codex plugin marketplace add dshakes/compass/plugin install
make install (~/.codex/AGENTS.md + config.toml )
Gemini CLI
gemini extensions install https://github.com/dshakes/compass
./install.sh --gemini (~/.gemini/GEMINI.md )
Cursor · Copilot · OpenCode · Windsurf
read the repo's AGENTS.md (
clone + make install

CLAUDE.md

· AGENTS.md

· GEMINI.md

are one file (symlinks), and the Claude/Codex plugin manifests + Gemini extension are generated from one source and CI-checked (scripts/check-vendor.sh

) — so a git pull

updates every agent at once and a manifest can't drift.

The marketplace/extension manifests match each vendor's documented schema and are structure-validated in CI. The live install is

manually verified—gemini extensions install

(gemini 0.26.0) andcodex plugin marketplace add

(codex 0.130.0) both succeed against this repo — but isn't run in our CI (those CLIs aren't in the runner).

compass doctor      # validate the install — expect "0 error"
compass status      # is compass active here, and what's loaded?

Then just open Claude Code as usual — the manual, guardrails, subagents, commands, and status line are already loaded. Feel it in a minute: ask for a dangerous command (blocked), run /review

on your diff, or compass route "<task>"

to see the tier it picks. No tokens, no signup for any of it.

Everything below is on after one install or a single opt-in — the autonomous loops above sit on top of this. The README sells; the docs explain — each row links to the detail.

Capability One line Deep dive
🔁 Autonomous SDLC
the review → security → tests → Codex audit → auto-fix → re-review loop; you merge

The fleet* all*your repos through a test gate; approve from your phone14-fleetThe crew + workflows12·13** Guardrails & scanning**compass scan

), auto-format, keep a JSONL audit log16-hardeningRed-team hardening17-red-team** Cost-tier router**router/** The compass CLI**onboard · impact · drift · scan · redteam · sandbox · verify · audit-log · spend · dashboard

11-usingMCP + LSP****version-pinned MCP servers (context7 · fetch · git) + opt-in language-server intelligence04·06Every agent, one sourcestandardAGENTS.md

12-every-agentLive budget ceilingCOMPASS_MAX_USD

) — enforced, not just reported02-costCost disciplinecompass spend

/impact

to see the $02-costBuilt to be trusted before it's run — and honest about its limits.

You own the irreversible. Agents prepare; humans push, merge, deploy. Required checks + a code-owner approval enforce it — there's no "merge to prod" button.Readable & reversible. Nocurl | sh

. The installer backs up what it replaces, is idempotent, andmake uninstall

removes only what it added. Pin a tag, notmain

.Guardrails reduce footguns; they are not a security boundary. Keep least-privilege credentials and review your diffs. (For untrusted code,compass sandbox

is a real boundary.)Red-team hardening is defense-in-depth, not immunity. It warns on prompt-injection (direct/indirect/paste), CLAUDE.md poisoning, and local safety-override, and refuses to grant project-level safety exceptions — but the cardinal rule (external content is data, not instructions) and the human gate are what actually hold.compass redteam

measures it; see.docs/17-red-team.md

What talks to the network. compass phones home to nothing. The auto-registered MCP servers reach non-Anthropic endpoints —context7

→ Upstash (library docs),fetch

→ URLs you request;git

is local. Hooks are short, commented shell scripts inclaude/hooks/

; disable any viaclaude/settings.json

.Grounded, not invented. Every capability maps to a documented Claude Code / Codex primitive — cited in.docs/07-practices.md

Status: alpha.The core — manual, hooks, subagents, commands, MCP, plugin — is stable and dogfooded daily. TheSDLC pipelineis newer: its logic is statically validated in CI and exercised via a smoke-test checklist you run on your own repo — treat it as early. Thered-team layeris new: its detectors are eval-gated in CI (precision/recall on a labeled corpus) and resist obfuscation (compass redteam --attack

), but pattern detection is best-effort defense-in-depth, not immunity — and the managed-guardrail adapters are response-parsing contract-tested, with thelive Bedrock/Azure calls unverified in CI(need your creds) and no live third-party benchmark scores (see[docs/17]).Dynamic workflowsare a Claude Code research preview. The human merge/deploy gate is permanent, by design.

** Start here → Using compass** — install, the pieces in plain language, the daily workflow.

Philosophy · Architecture · Cost & models · Customize · MCP · Plugin & team rollout · LSP · Practices · Defaults · SDLC · Roadmap · Every agent · Dynamic workflows · Fleet · Competitive audit · Hardening + frontier · Red-team · Open benchmark · Provenance · Router module · ADRs

MIT · built to be shared · contributions welcome

── more in #ai-agents 4 stories · sorted by recency
── more on @compass 3 stories trending now
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/compass-guardrails-a…] indexed:0 read:11min 2026-06-21 ·