{"slug": "a-user-space-firewall-that-gates-an-ai-agent-s-actions", "title": "A user-space firewall that gates an AI agent's actions", "summary": "Guardian, an open-source user-space firewall for AI agents, has released v0.1.0, intercepting and evaluating agent actions with a deterministic policy engine. In testing, it reduced prompt-injection attack success from 100% to 0% on the AgentDojo banking suite and achieved 0% false negatives and false positives on its own benchmark. The tool is agent-agnostic and designed to prevent unauthorized file, shell, network, and service access.", "body_md": "Status:working product,v0.1.0 released. Rust workspace, 196 tests green. Implemented throughPhase 4 (hardening): the deterministic policy engine, the tamper-evident audit log (optionallysealed-key signed), the advisory Checker, the MCP gateway + stdio transport, the daemon + control socket, the terminal approval cockpit (TUI), the AgentDojo eval harness, thenetwork proxy with TLS interception(broker-injected credentials, exfiltration inspection, default-deny egress, cockpit`ask`\n\n-routing), theOS exec sandbox, thetoken broker(OS keychain + least-privilege caveats),lightweight verifiable credentials,adaptive suggestions + safety report,ed25519-signed community policy packs, and anintrinsic critical-category floor(money / credentials / exfiltration / irreversible deletion can never resolve to a silent`allow`\n\n, not even via a signed pack). Getting started:[. Remaining for 1.0: signed/notarized]`docs/user-guide.md`\n\npackagingand the desktop GUI — see[.]`ROADMAP.md`\n\nEvaluation:on AgentDojo with a local 12B agent, Guardian cuts the prompt-injection attack-success rate on the banking suite from100% → 0%(deterministic deny on money-movement). Our own— a benchmark built[GuardianBench]for an action-firewall— scores0% false-negatives, 0% false-positives, 100% refusal-correctnessacross 8 domains, plus0% PII leaksin its tokenization layer (the data broker, ADR-0005). See[for the full, honestly-caveated scorecard (including where an action-firewall's scope ends — below).]`evaluation/`\n\n📄 White paper:[design & threat model (PDF)]— or read it[on GitHub](with diagrams of how it works and its impact on the agent).\n\nLicense:[Apache-2.0]·Governance:[CONTRIBUTING]·[SECURITY]·[CODE_OF_CONDUCT]·[ADRs]This README is the canonical spec (idea, full feature set, architecture, threat model). For\n\nhowandin what orderit's built, see`ROADMAP.md`\n\n; for what's landed, see`docs/changelog.md`\n\n.\n\nGuardian is\n\nearly-stage software (v0.1.0)that can be configured to handlesensitive data(credentials, personal data, financial details). It is provided\"AS IS\", without warranty of any kind, under the[Apache-2.0]license (see Sections 7–8). To the maximum extent permitted by law,the author accepts no liabilityfor any damage, data loss, security breach, financial loss, or other harm arising from the use, misuse, or inability to use this software.You are solely responsiblefor evaluating its fitness for your purpose, for how you configure your policy, and for the security of any data you route through it. It isnotcertified, audited, or production-hardened, and must not be relied upon as the sole safeguard for high-stakes or regulated workloads. See[SECURITY.md]for the threat model and how to report a vulnerability.\n\n**Guardian** is a local, user-space \"firewall\" that sits between an autonomous\nAI agent and the things it can touch — your files, your shell, the network, and\nthe online services you delegate to it. It does **not** trust the agent. Every\naction the agent attempts is intercepted *as a structured action* at the\nagent's tool/MCP boundary, evaluated by a **deterministic policy engine**, and\n— when a decision needs a human — explained in plain language by a separate\n\"translator\" model before you approve or deny it. Guardian is **agent-agnostic**\n(it does not care whether the agent is driven by Claude, GPT, Llama, or anything\nelse) and **OS-friendly** (it never installs a kernel module or fights the\noperating system for control).\n\n**Fastest — download a prebuilt binary** (no toolchain needed) from the\n[latest release](https://github.com/Vadale/project-guardian/releases/latest).\nIt's unsigned, so the OS asks once: macOS → right-click → *Open*; Windows →\nSmartScreen → *More info → Run anyway*; Linux → `chmod +x guardian`\n\n. Then `guardian --help`\n\n.\n(Windows is experimental/untested — see [ docs/user-guide.md](/Vadale/project-guardian/blob/main/docs/user-guide.md).)\n\n**Set it up in one command.** `guardian init`\n\ncreates `~/.guardian/{config.toml,policy.toml}`\n\nfor your role and prints the exact next steps + the MCP snippet to paste:\n\n```\nguardian init                         # or: --role personal-assistant\nguardian-daemon                       # terminal 1 — the service\nguardian ui                           # terminal 2 — the approval cockpit (TUI)\n```\n\nThen point your agent's MCP client at Guardian (the snippet `guardian init`\n\nprints —\nworks for Claude Code, Cursor, or any MCP client):\n\n```\n{\n  \"mcpServers\": {\n    \"guardian\": { \"command\": \"guardian\", \"args\": [\"mcp\", \"--daemon\", \"/tmp/guardian.sock\"] }\n  }\n}\n```\n\nWhen an action needs your approval the daemon raises a **desktop notification**, so you\ndon't have to watch the cockpit (set `notifications = false`\n\nin the config to disable).\n\n**Or build from source** — requires the [Rust toolchain](https://rustup.rs):\n\n```\ncargo build --release\n\n# 1) see the traffic-light mediation end to end (scripted, no setup)\ncargo run -p guardian-cli -- demo\n\n# 2) the internal red-team scorecard (deterministic, no model needed)\ncargo run -p guardian-cli -- eval\n#    ...and GuardianBench, our action-firewall benchmark (FN 0% / FP 0% / refusal 100%):\nGUARDIAN_BIN=target/release/guardian python3 evaluation/guardianbench/guardianbench.py\n\n# 3) the full loop for a real agent — three terminals:\nGUARDIAN_SOCK=/tmp/g.sock cargo run -p guardian-daemon       # the service\nGUARDIAN_SOCK=/tmp/g.sock cargo run -p guardian-cli -- ui    # the approval cockpit (TUI)\n# then point an MCP client (e.g. Claude Code) at:\n#   guardian mcp --daemon /tmp/g.sock\n```\n\nRun the tests with `cargo test --workspace`\n\n. Measuring Guardian's effect on an\nagent's attack-success rate: [ evaluation/](/Vadale/project-guardian/blob/main/evaluation).\n\nAgents went from \"chatbots that talk\" to \"agents that act\" — they read and write files, run shell commands, browse, buy things, send email, and increasingly touch sensitive accounts (banking, health records, public-administration portals). That creates four concrete risks:\n\n**Sensitive-data exposure & destructive mistakes.** Giving an agent direct access to accounts, email, and private documents exposes the user to privacy violations, hallucinated destructive actions, and external attacks.**Prompt injection.** The dominant agent-security threat of this era: content the agent*reads*(a web page, a PDF, an email, a tool result) can contain instructions that hijack the agent into doing something the user never asked.**Click fatigue / informed-consent failure.** System-level agents pop up approval requests for scripts and API calls. Non-technical users do not understand them and approve everything blindly, which nullifies the safety.**No human-facing control surface and no traceability.** Existing tooling (raw harness permission prompts, Docker) is built for programmers. There is no intuitive \"control room,\" and no easy way to keep a tamper-evident record of what an agent actually did (relevant for transparency obligations such as the EU AI Act, Art. 50).\n\nThese are the rules that decide every later trade-off.\n\n**The security boundary is deterministic. The LLM is never the boundary.** Enforcement (allow / ask / deny) is done by a rule engine whose behavior is predictable and testable. An LLM can be*wrong*and can be*attacked*via prompt injection, so it is used only to**translate and risk-score**, never to unlock.** Intercept structured actions, not the agent's prose.**The policy engine and the translator look at the*real*intercepted action (the tool call and its arguments, the actual HTTP request, the file operation) — never at the agent's natural-language claim about what it intends to do. The claim is manipulable; the action is not.**Agent-agnostic by construction.** Control is applied at the action boundary, which is identical regardless of which model produced the action.**User-space, not kernel-space.** No kernel modules, no OS hooks that require vendor-granted entitlements. (See §4 — this is the central decision.)**Local-first / privacy-first.** Policy evaluation, learning, and the audit log live on the user's machine. Sending anything to the cloud is opt-in and explicit.**Defense in depth.** Mediation at the tool boundary is the primary control; OS sandboxing and a network proxy are containment backstops, not the plan A.**Fail closed on the critical path, fail open on convenience.** A failure in the money/credential/exfiltration path blocks; a failure in a low-risk path degrades gracefully (logs, defers to existing harness defaults).**Tamper-evident by default.** Everything Guardian decides is written to an append-only, hash-chained, signable audit log.\n\n**Resolved: Guardian acts at the agent's action boundary — the harness /\ntool-call / MCP layer — in user-space. It does NOT act in the OS kernel.**\n\n- Deep OS interception (Linux LSM/eBPF beyond user-space, macOS Endpoint Security\n& Network Extension, Windows minifilter/WFP kernel callouts) requires\n**vendor-granted entitlements, code-signing, notarization, and per-platform certification**. On macOS and Windows this is a wall for an open-source project and a solo/community maintainer. - Kernel-level bugs crash the user's machine. The blast radius of a mistake is the whole OS.\n- It is the wrong altitude: at the syscall level you see\n`write(fd, buf, n)`\n\n, not*\"the agent is about to wire €4,000 to an unknown IBAN.\"*Intent is legible at the action boundary, not at the kernel.\n\nModern agent harnesses (Claude Code, Cursor, the OpenAI Agents runtime, and any\nMCP-speaking client) already mediate **everything** the agent does through a\n**tool-call interface**. The agent cannot touch the world except by calling a tool\nthe harness exposes. **The harness is already the choke point** — Guardian's job\nis to *be*, *wrap*, or *plug into* that mediation layer instead of fighting the OS\nfor a second, redundant one.\n\nThis gives us, for free:\n\n**Structured actions**(tool name + typed arguments) instead of guessed intent.** Agnosticism**— the tool boundary looks the same under any model.** No entitlements, no kernel, no notarization headaches.****Cross-platform parity**— the same logic runs on macOS, Windows, Linux.\n\nHarness-level interception is only as complete as the harness's own mediation.\nThe hard case is a **raw Bash/exec tool**: once\n\n`bash`\n\nruns, its sub-behaviors\n(subprocesses, interpreters, raw syscalls, `base64 -d | sh`\n\n) are *not*individually mediated. Text-scanning the command is\n\n**not** a security boundary. We handle this with a layered answer:\n\n**Prefer structured tools over raw shell.** Where the harness allows it, expose mediated, typed tools (read_file, write_file, http_request, send_email) instead of a raw shell. Structured tools are fully policy-able.**Contain the dangerous tools.** When raw`exec`\n\n/`shell`\n\n/network*must*exist, run that tool's execution inside an**off-the-shelf OS sandbox**(container,`sandbox-exec`\n\n/Seatbelt profile, bubblewrap, Windows AppContainer/Sandbox) and inside a**network proxy**(below). This is defense-in-depth using existing, user-space tooling — not custom kernel work.** Mediate the network regardless.**A user-space** forward proxy with an installed CA**(mitmproxy-style) catches*all*HTTP(S) no matter how it was made, which is where network policy, header signaling, and content watermarking actually happen.\n\nSo the layered model is: **mediate at the tool boundary (plan A) → contain\nhigh-risk tools in a sandbox + route all traffic through the proxy (backstop).**\n\n```\n            ┌──────────────────────────────────────────────────────────┐\n            │  Agent (any model: Claude / GPT / Llama / local / …)        │\n            └──────────────────────────────────────────────────────────┘\n                               │  structured action (tool call / MCP / HTTP)\n                               ▼\n   ┌───────────────────────────────────────────────────────────────────────┐\n   │                            GUARDIAN CORE                                 │\n   │                                                                          │\n   │  ┌────────────────────┐   ┌──────────────────────┐  ┌────────────────┐  │\n   │  │ 1. POLICY ENGINE    │   │ 2. CHECKER (LLM)      │  │ 3. AUDIT LOG    │  │\n   │  │ deterministic       │──▶│ translator + risk     │  │ append-only,    │  │\n   │  │ allow / ask / deny  │   │ score — ADVISORY ONLY │  │ hash-chained,   │  │\n   │  │ (the boundary)      │   │ never unlocks         │  │ signable        │  │\n   │  └────────────────────┘   └──────────────────────┘  └────────────────┘  │\n   │           │ \"ask\"                                                         │\n   │           ▼                                                               │\n   │  ┌────────────────────┐   ┌──────────────────────┐  ┌────────────────┐  │\n   │  │ 4. APPROVAL UI      │   │ 5. IDENTITY & TOKEN   │  │ 6. ADAPTIVE     │  │\n   │  │ traffic-light       │   │ BROKER: scoped OAuth, │  │ LEARNING        │  │\n   │  │ dashboard + report  │   │ macaroons, keychain/  │  │ (constrained,   │  │\n   │  │                     │   │ Secure Enclave/TPM    │  │ local only)     │  │\n   │  └────────────────────┘   └──────────────────────┘  └────────────────┘  │\n   └───────────────────────────────────────────────────────────────────────┘\n        │ filesystem            │ network                 │ credentials\n        ▼ structured FS tools   ▼ forward proxy (MITM CA) ▼ broker injects creds\n          + optional sandbox      + header/watermark        at proxy; agent never\n          for raw exec            injection                  sees raw secrets\n```\n\nIn priority order from most to least agnostic:\n\n**(a) MCP gateway / proxy**—*primary, most agnostic.*Guardian runs as an MCP server that aggregates and re-exposes the user's real MCP servers and tools. The harness points at Guardian; every`tools/call`\n\npasses through the policy engine before being forwarded. Works with any MCP-speaking client.**(b) Native hook adapter**— for harnesses with a hook system (e.g. Claude Code's`PreToolUse`\n\n/`PostToolUse`\n\n). Guardian registers as the hook handler and returns allow/ask/deny per call. Lowest friction where available; gives a true deterministic deny.**(c) HTTP(S) forward proxy**— a user-space CONNECT proxy with a locally installed CA. Intercepts*all*outbound traffic for network policy, the agent-signaling header, and content watermarking. This is the only \"system-ish\" piece and it is still pure user-space (a proxy + a trusted cert).**(d) LLM gateway proxy**—*optional.*Proxy the model API itself to capture the raw request/response, strip injected instructions from tool results, and attach provenance. Useful but not required for MVP.\n\n- Deterministic evaluation of declarative rules → exactly one of\n`allow`\n\n/`ask`\n\n/`deny`\n\nper action. - Conditions over structured fields (tool name, arguments, target host, file path,\namount, time, source) using a sandboxed, side-effect-free expression evaluator\n(candidate:\n**CEL** or an**OPA/Rego**-style evaluator — chosen for being decidable and testable, not Turing-complete scripting). **No network, no LLM, no I/O inside evaluation.** Pure function of (action, context, policy). This is what makes it auditable.\n\n- Two jobs: (1)\n**translate** a pending action into plain language with its real impact (`chmod 777 ./data`\n\n→*\"lets anyone on this machine read and modify these files\"*); (2) emit a**risk score / rationale** to help rank the approval queue. **It never decides allow/deny.** Even a fully compromised Checker cannot unlock an action the policy engine denied.- It reads the\n**structured intercepted action**, never the agent's prose. - Model is\n**pluggable**: a small local guard model by default (privacy), with an*opt-in*escalation to a stronger model for ambiguous cases. This resolves the doc's old contradiction between \"learns locally, no cloud\" and \"high-performance checker.\"\n\n- Cross-platform desktop app (candidate:\n**Tauri**— lightweight, Rust-friendly — over Electron). - The\n**traffic-light** surface: a queue of`ask`\n\nitems with the Checker's plain translation and a one-click allow/deny; a live activity view; the weekly/monthly**report**(the \"safety service report\") that summarizes silently-allowed actions, blocked threats, and rule suggestions to batch-confirm — the answer to click fatigue.\n\n- Append-only,\n**hash-chained**(each entry commits to the previous), optionally signed with a per-install key held in the OS keychain / Secure Enclave / TPM. - Records: action, decision, matched rule, Checker rationale, user response, time.\n- This is what makes \"traceability\" real rather than marketing.\n\n- The agent\n**never sees raw credentials.** It asks the broker to perform an authenticated action; the broker injects credentials at the proxy layer. - Mechanisms (replacing the old \"ZKP\" idea, which was the wrong primitive):\n**Scoped OAuth 2.1 tokens** where the service supports them.**Macaroons**— bearer tokens with attenuating*caveats*(expiry, max amount, allowed endpoints, source binding) — purpose-built for delegated, narrowable authority. This is the core mechanism for \"let the agent do X but only X.\"**Hardware-backed keys**(Secure Enclave / TPM) for signing and secret storage.** W3C Verifiable Credentials / DIDs**for decentralized identity claims.** PSD2 / Open Banking APIs**as the correct rail for EU banking (not screen- scraping a bank with a spoofed header).\n\n- Design note: because the broker holds credentials,\n**Guardian itself becomes the highest-value target.** Its own hardening (signed policies, sealed keys, notarized builds) is a first-class requirement, not an afterthought.\n\n- May\n**suggest** downgrading a repeatedly-approved low-risk action to \"auto-allow in this context,\" and may upgrade an action seen in anomalous context (odd hours, unknown destination) to \"ask.\" **Hard rule: never auto-downgrade critical categories**— money movement, credential access, data exfiltration, irreversible deletions. Auto-habituation is exactly the vector an attacker would farm, so these always require explicit human action. Suggestions decay over time and are bound to the context in which they were learned.- 100% local. No behavioral data leaves the machine.\n\nGuardian only works if the supervised agent cannot escape or switch it off:\n\n- Guardian is the agent's\n**only** path to tools. If the daemon is unreachable, there is no decision, so there is no action —**fail closed**, not open. - The agent process runs\n**without authority** to kill the Guardian daemon, remove/alter the proxy CA, or edit the active policy (separate privilege + sealed policy). A bypass attempt is recorded as a gap in the tamper-evident log. - Supervised restart: if the daemon dies, the integration refuses to forward tool calls until Guardian is back.\n\nA one-action, always-available control that **immediately revokes all agent\nauthority**: pause every adapter, deny all pending and new actions, and freeze\nbrokered credentials. Reachable from the UI and the CLI (`guardian panic`\n\n), and\ndesigned to work even if the Checker/UI is unresponsive.\n\n- A single declarative config (TOML) selects the active role/policy pack, the Checker backend, and which adapters are enabled. Validated on load; invalid config fails closed.\n**Safe defaults:** first run starts in the most restrictive role (everything unknown →`ask`\n\n), egress allow-list empty, learning off. The user widens deliberately — never the reverse.\n\n- All state (audit log, learned suggestions, config) is stored\n**locally** under the OS user-data dir; secrets live in the OS keychain, never plaintext on disk. - The audit log is append-only and hash-chained; retention is user-configurable and export is explicit. Nothing is transmitted unless the user opts into remote Checker escalation (the only network path), which is per-action and logged.\n\nThe Checker's plain-language explanations and all UI copy are **localizable** (the\naudience is non-technical users worldwide). Decisions and logs stay in a stable\nmachine-readable form; only the human-facing presentation is translated.\n\n| # | Feature | What it does | Correction vs. the original idea |\n|---|---|---|---|\n| A | Maker/Checker |\nMaker = the third-party agent doing the task; Checker = local model that translates pending actions into plain language + risk score. | Checker is advisory only and reads structured actions, not the agent's prose. The deny decision is the deterministic engine's. |\n| B | Traffic-light permissions |\nGreen = silent allow; Yellow = pause + translated approval popup; Red = auto-block + notify. Roles (e.g. \"Web Dev\", \"Tax Assistant\") preload rule sets. | Enforcement is the deterministic policy engine (capability/attribute-based), with roles as presets. The traffic light is the UI on top. |\n| C | Adaptive security |\nLearns to downgrade safe repeated actions and upgrade anomalous ones, locally. | Never auto-downgrades critical categories; context-bound; decaying; local-only. |\n| D | Reporting |\nBatches low-risk auto-approvals into a periodic report; user confirms suggested rules. | Unchanged — this is good design and directly fights click fatigue. |\n| E | Provenance / watermarking |\nOptional content footer on AI-generated email/comments/forms; optional agent-signaling HTTP header; everything logged. | The HTTP header is a courtesy signal only — not security (trivially spoofable, can trigger anti-bot blocks). Content marking helps the user's AI-Act transparency duty but does not by itself make anyone \"compliant.\" Treat as opt-in, default-off for the header. |\n| F | Identity & tokens |\nBroker holds credentials; agent gets scoped, time/amount-limited authority. | Capability tokens / macaroons / scoped OAuth / hardware keys / VCs — not ZKP (which was the wrong primitive for delegation). |\n\n| Adversary | Vector | Primary defense |\n|---|---|---|\nHallucinating / misaligned agent |\nIssues a destructive or wrong action in good faith | Deterministic policy engine; `ask` for anything irreversible; Checker translation so the human catches it |\nPrompt injection (the #1 threat) |\nMalicious instructions hidden in web pages, PDFs, emails, or tool results steer the agent | Policy evaluates the real action regardless of why the agent wants it; critical categories always `ask` /`deny` ; optional LLM-gateway sanitization of tool results; Checker never reads attacker-controlled prose as authority |\nCompromised / malicious MCP server or tool |\nA tool lies about what it does, or exfiltrates | Guardian sees the actual call/args and the actual network egress (proxy), not the tool's self-description; egress allowlists |\nSensitive-data leak via the agent's output |\nAn injection makes the agent put account numbers, who-did-what, or secrets into its response/reasoning | For data to reach an attacker it must cross a channel (message / network / file) — which is an action Guardian gates; the token broker (§5.6) keeps credentials and full secrets out of the agent entirely (injected only into the authorized egress, so the agent cannot reveal what it never held); residual data the agent legitimately handled is bounded by least-privilege + a planned output-guard (Checker-side DLP redaction). The action-firewall does not police the agent's prose by design |\nMalicious policy pack (supply chain) |\nA community pack whitelists an exfiltration path | Signed packs, mandatory review, reputation, a \"packs cannot widen critical-category permissions without explicit user opt-in\" rule, and an intrinsic runtime floor — critical-category actions can never resolve to a silent `allow` regardless of any rule or pack |\nAttack on Guardian itself (it holds the keys) |\nTamper with policy, steal brokered creds, forge the log | Sealed keys (Secure Enclave/TPM), signed/locked policy, hash-chained log, notarized builds, fail-closed |\nAgent disables / bypasses its guardian |\nKill the daemon, remove the proxy CA, or route around the gateway to escape supervision | Guardian is the agent's only path to tools (no decision → no action = fail closed); the agent has no authority to stop the daemon or alter CA/policy; supervised restart; gaps flagged in the log. See §5.8 |\nHabituation attack |\nTrain the adaptive layer to auto-allow, then strike | Never auto-downgrade critical categories; decay + context binding |\nMalicious user |\nTries to use the agent + Guardian for harmful/abusive ends | Out of scope for \"protect the user,\" but Guardian's provenance/logging makes misuse attributable |\n\nFull threat model — assets, trust boundaries, attack trees, residual risks, and the OWASP/NIST mapping — lives in\n\n[.]`docs/threat-model.md`\n\nDeclarative, reviewable, version-controlled. Illustrative only — final schema TBD.\n\n```\n# role: \"personal-assistant\"\nversion: 1\ndefaults:\n  decision: ask            # unknown actions default to human review\n\nrules:\n  - id: read-project-files\n    when: tool == \"read_file\" && path.startsWith(\"~/DOCUDESK/\")\n    decision: allow         # GREEN: silent\n\n  - id: shell-anything\n    when: tool == \"exec\"\n    decision: ask           # YELLOW: pause + translate\n    sandbox: true           # and run it contained, regardless of approval\n\n  - id: chmod-world-writable\n    when: tool == \"exec\" && args.cmd matches \"chmod\\\\s+(777|o\\\\+w)\"\n    decision: ask\n    explain: \"Makes files modifiable by any user on this machine.\"\n\n  - id: outbound-known-hosts\n    when: tool == \"http_request\" && host in trusted_hosts\n    decision: allow\n\n  - id: money-movement\n    when: capability == \"payment\"\n    decision: ask\n    critical: true          # may NEVER be auto-downgraded by learning\n    cap: { amount_max: 200, currency: \"EUR\" }\n\n  - id: bulk-delete\n    when: tool == \"delete\" && args.count > 10\n    decision: ask\n    critical: true\n\n  - id: data-exfiltration\n    when: tool == \"http_request\"\n            && method == \"POST\"\n            && body.contains_secret\n            && host not in trusted_hosts\n    decision: deny          # RED: auto-block + notify\n    critical: true\n```\n\n**Achieved by:**\n\n- Intercepting at the\n**action boundary**(MCP/tool/HTTP), which is identical under any model. - A\n**pluggable Checker model**(local or remote, user's choice). **Per-harness adapters** that all feed the same policy engine.\n\n**We deliberately do NOT:**\n\n- ❌ Install kernel modules or use OS hooks requiring vendor entitlements.\n- ❌ Let any LLM be the allow/deny boundary.\n- ❌ Treat the spoofable\n`User-Agent`\n\nheader as a security control. - ❌ Use ZKP as the delegation primitive (use macaroons / scoped tokens / VCs).\n- ❌ Auto-downgrade critical-category actions via learning.\n- ❌ Claim Guardian \"makes the user legally compliant\" — it\n*helps*with transparency/traceability; legal sign-off is the user's. - ❌ Send behavioral/learning data to the cloud (Checker escalation is the only network path, and it is opt-in).\n\n**Out of scope (for now) — explicitly deferred, not forgotten:**\n\n- Multi-agent / agent-to-agent supervision (an OWASP Agentic 2026 risk class) — the current model guards a single agent; multi-agent mediation is future work.\n- Deep OS/kernel interception (see §4) — never in scope.\n- Any proprietary/enterprise tier — this repo is fully open source.\n\n- Set up the Rust workspace (\n**Rust decided — ADR-0001**; see ROADMAP §0). - Repo scaffolding, license (\n**Apache-2.0**, see`LICENSE`\n\n), CI, contribution guide. - Define the\n**action model**(the canonical structured representation every adapter normalizes into). - Write the formal\n**threat model** and**policy schema** as living specs.\n\n-\n**MCP gateway adapter**(primary) for one MCP-speaking harness. -\n**Deterministic policy engine** with the declarative schema + CEL/Rego-style evaluator + a full test suite (golden cases per rule). -\n**Checker** translator using a pluggable model; reads structured actions only. -\n**Approval UI**(Tauri): traffic-light queue + plain-language explanation + allow/deny. -\n**Tamper-evident audit log**(append-only, hash-chained). -\n**One real demo scenario end-to-end**(e.g. agent edits files + makes an HTTP request; Guardian allows greens silently, pauses a yellow with a translated popup, blocks a red exfiltration attempt).\n\n**MVP definition of done:** a non-technical user can watch an agent work, get a\n*human-readable* approval prompt for one risky action, see one bad action blocked\nautomatically, and read a log of everything that happened — with **no LLM in the\ndeny path**.\n\n-\n**HTTP(S) forward proxy** with installed CA: network policy, egress allowlists, optional agent-signaling header, optional content watermark. -\n**OS sandbox wrapper** for raw`exec`\n\ntools (Docker / sandbox-exec / bubblewrap / AppContainer) — defense in depth, off-the-shelf only. -\n**Native hook adapter**(e.g. Claude Code`PreToolUse`\n\n).\n\n-\n**Identity & token broker**: scoped OAuth, macaroons, keychain/Secure Enclave/TPM storage; agent never sees raw secrets. -\n**Constrained adaptive learning**+ the periodic** report**. -\n**Signed community policy packs**+ the trust/review pipeline (this is the open-core community engine). - Optional\n**LLM gateway proxy** with tool-result sanitization. - Additional harness adapters (Cursor, OpenAI Agents runtime, generic MCP).\n\n**Core / policy engine / proxies:** Rust (security rigor, cross-platform) — Go is an acceptable alternative for proxy/MCP velocity.**Policy expressions:** CEL or an OPA/Rego-style evaluator (decidable, testable).**Desktop UI:** Tauri.**Audit log:** append-only hash-chained store (e.g. SQLite + chained hashes, or a purpose-built log); per-install signing key in OS keychain/Secure Enclave/TPM.**Network proxy:** user-space MITM proxy + locally trusted CA.**Sandbox backstops:** Docker /`sandbox-exec`\n\n(macOS) / bubblewrap (Linux) / AppContainer or Windows Sandbox (Windows) — all off-the-shelf.\n\n- Which harness do we target\n**first** for the MCP gateway? (Drives the demo.) - Default local Checker model — which small model balances quality vs. footprint?\n- Policy expression language — CEL vs. Rego (DX, sandboxing, ecosystem)?\n- How do signed policy packs get reviewed at community scale without a bottleneck?\n- CA-installation UX for the proxy — how to make trusting a local CA safe and non-scary for non-technical users?\n- How much of the AI-Act transparency story do we promise vs. explicitly disclaim? (Get legal input before any compliance claim ships.)\n\n**Harness**— the runtime that drives an agent and mediates its tool calls (e.g. Claude Code). Guardian plugs into this layer.** Maker**— the third-party agent performing the user's task.** Checker**— Guardian's local translator/risk-scorer model (advisory only).** MCP**— Model Context Protocol; the tool/server protocol Guardian proxies.** Macaroon**— a bearer credential that can be attenuated with contextual caveats.** Critical category**— money movement, credential access, data exfiltration, irreversible deletion; never auto-downgraded by learning.", "url": "https://wpnews.pro/news/a-user-space-firewall-that-gates-an-ai-agent-s-actions", "canonical_source": "https://github.com/Vadale/project-guardian", "published_at": "2026-06-29 21:38:21+00:00", "updated_at": "2026-06-29 21:52:37.403929+00:00", "lang": "en", "topics": ["ai-safety", "ai-agents", "ai-tools", "ai-policy", "ai-research"], "entities": ["Guardian", "AgentDojo", "Claude", "GPT", "Llama", "Apache-2.0", "GitHub", "Vadale"], "alternates": {"html": "https://wpnews.pro/news/a-user-space-firewall-that-gates-an-ai-agent-s-actions", "markdown": "https://wpnews.pro/news/a-user-space-firewall-that-gates-an-ai-agent-s-actions.md", "text": "https://wpnews.pro/news/a-user-space-firewall-that-gates-an-ai-agent-s-actions.txt", "jsonld": "https://wpnews.pro/news/a-user-space-firewall-that-gates-an-ai-agent-s-actions.jsonld"}}