{"slug": "i-pointed-capgate-at-damn-vulnerable-mcp-here-s-what-it-caught-and-what-it-t", "title": "I pointed capgate at Damn Vulnerable MCP. Here's what it caught — and what it couldn't.", "summary": "A developer tested capgate, a compile-time sandbox compiler, against the ten deliberately vulnerable MCP servers in the Damn Vulnerable MCP (DVMCP) project. Capgate cleanly stopped one class of attack by enforcing filesystem boundaries, reduced the blast radius for several others, and was ineffective against one class. The test provides a realistic assessment of capability-based sandboxing for MCP servers.", "body_md": "*A capability-compiler meets ten deliberately-broken MCP servers. The honest scorecard: it cleanly stops one class, shrinks the blast radius on several, and is useless against another. Knowing which is which is the whole point.*\n\nDisclosure: I'm the author of\n\n[capgate], the Apache-2.0 sandbox compiler this post puts to the test. The DVMCP project and the other tools mentioned aren't mine; the manifests and compiled output are reproducible from the[repo].\n\n[Damn Vulnerable MCP (DVMCP)](https://github.com/harishsg993010/damn-vulnerable-MCP-server) is a teaching project: ten MCP servers, each built to demonstrate one attack — prompt injection, tool poisoning, excessive permission scope, token theft, command injection, and so on. It's the closest thing the ecosystem has to a shared adversarial fixture.\n\n[capgate](https://github.com/razukc/capgate) is a *compile-time* tool. You write a manifest declaring what an MCP server is *allowed* to do — `fs:read:/workspace/**`\n\n, `net:connect:api.github.com:443`\n\n, nothing else — and it compiles that to a concrete sandbox policy (`docker run`\n\nflags, bwrap argv, or an egress-proxy config). It does **not** run anything, watch traffic, or inspect the server's code. It turns a declared capability set into an enforced boundary.\n\nSo this is a fair, falsifiable test: for each DVMCP challenge, I wrote the *honest minimum* manifest, compiled it, and asked one question — **does the boundary capgate emits actually stop the attack?**\n\nThe answer is not \"yes\" across the board, and the cases where it's \"no\" are the interesting ones.\n\nThe vulnerable tool advertises \"read a file from the public directory\" and then does this:\n\n``` php\n@mcp.tool()\ndef read_file(filename: str) -> str:\n    # VULNERABILITY: doesn't restrict file access to the public directory\n    if os.path.exists(filename):          # any absolute path works\n        with open(filename, \"r\") as f:\n            return f.read()\n```\n\nThe private directory next door holds `employee_salaries.txt`\n\n, `acquisition_plans.txt`\n\n, and `system_credentials.txt`\n\n(a live DB password and cloud API keys). A prompt-injected agent just calls `read_file(\"/tmp/dvmcp_challenge3/private/system_credentials.txt\")`\n\nand walks out with everything.\n\nThe honest manifest — what the tool *claims* to need:\n\n```\n{ \"name\": \"read_file\", \"capabilities\": [\"fs:read:/tmp/dvmcp_challenge3/public/**\"] }\n```\n\ncapgate compiles it (`--target docker`\n\n) to:\n\n```\n--rm --cap-drop ALL --security-opt no-new-privileges --read-only\n--network none\n--volume /tmp/dvmcp_challenge3/public:/tmp/dvmcp_challenge3/public:ro\n```\n\n**The attack now fails — not because the path check got better, but because the private directory is not mounted into the container.** `read_file(\"/tmp/.../private/system_credentials.txt\")`\n\nreturns *file not found*, because inside the sandbox that file does not exist. The path-traversal bug is still in the code; capgate made it unreachable. Network is off, the filesystem is read-only, every capability is dropped.\n\ncapgate is loud about one approximation it made here. The output carries a `notes[]`\n\nentry: *\"fs: `/tmp/dvmcp_challenge3/public/** lowered to volume mount /tmp/dvmcp_challenge3/public` — Docker mounts directories, not globs. Fine-grained glob enforcement is the server's job.\"* The declared capability was a glob; Docker can only mount a directory. capgate grants the\n\nThis is capgate's bullseye. The vulnerability *is* over-broad reach, and a capability boundary is exactly the right shape of answer. One of ten — but it's a clean kill.\n\nThese are the honest middle. capgate doesn't stop the bug; it shrinks what the bug can achieve.\n\nThe tool leaks a bearer token and API key into an error string (which flows straight into the LLM context):\n\n```\nAuthorization: Bearer {email_token.get('access_token')}\nAPI Key: {email_token.get('api_key')}\n```\n\ncapgate can't stop the tool from *reading* its own token. What it can do is constrain where that token can *go*. The honest manifest declares one egress endpoint, and the `--target egress --egress-target squid`\n\noutput is:\n\n```\n# capgate-egress.squid.conf (generated — do not edit)\nacl to_private dst 10.0.0.0/8 172.16.0.0/12 192.168.0.0/16 127.0.0.0/8 169.254.0.0/16 ::1/128 fc00::/7 fe80::/10\nhttp_access deny to_private\nacl cg_dst_0 dstdomain api.emailpro.com\nacl cg_port_0 port 443\nhttp_access allow cg_dst_0 cg_port_0 CONNECT\nhttp_access deny all\n```\n\nA poisoned tool that tries to POST the token to `attacker.example.com`\n\nis refused at the proxy — the allowlist contains exactly one host, and the config ends in an unconditional `deny all`\n\n. The classic prompt-injection-to-exfiltration chain is broken at the network boundary.\n\n**Honest caveat, stated plainly:** the token still reaches the model's context, and if an attacker can smuggle it out through the *one allowed channel* (a crafted request to `api.emailpro.com`\n\nitself), capgate does not see it. It closes the broad exfil path, not every conceivable one. (A second honesty note: DVMCP stores these tokens in a world-readable file; a faithful capgate manifest would never grant `fs`\n\naccess to that file, so the tool couldn't read it at all. The egress allowlist is the backstop for when the secret legitimately lives in the process.)\n\nThis one exposes a real limit of the grammar, and it's worth being loud about. The tool is:\n\n``` php\n@mcp.tool()\ndef execute_shell_command(command: str) -> str:\n    result = subprocess.check_output(command, shell=True, ...)   # arbitrary shell\n```\n\n**capgate's capability grammar cannot express \"run arbitrary shell.\"** `exec`\n\nis basename-only (`exec:spawn:git`\n\n), by design — there is no `exec:spawn:*`\n\n. So you *cannot* write an honest manifest that grants this tool what it actually does. capgate's own docs say it: *\"a manifest that under-declares is a bug in the manifest.\"* capgate will not make a shell-exec tool safe, and it doesn't pretend to.\n\nWhat it does instead is contain the blast radius of the surrounding server. Compile the *legitimate* tools (`get_system_info`\n\n, `analyze_log_file`\n\n) and you get:\n\n```\n--rm --cap-drop ALL --security-opt no-new-privileges --read-only\n--network none\n--volume /tmp/dvmcp_challenge8/logs:/tmp/dvmcp_challenge8/logs:ro\n```\n\nIf `execute_shell_command`\n\nships anyway and fires, it runs inside *that* box: no network, no Linux capabilities, read-only rootfs, no injected secrets, only the logs directory visible. Successful RCE that can't reach the network, can't escalate, and can't see a credential is a dramatically smaller incident. That's defense-in-depth — explicitly *not* prevention.\n\n`network_diagnostic(target, options)`\n\npipes user input straight into `shell=True`\n\n. It's a network tool, so the honest manifest must grant `net:connect:*`\n\n— and capgate is honest about what that costs:\n\n```\n{ \"egress\": [{ \"host\": \"*\", \"port\": null, \"blockPrivate\": true }] }\n```\n\nA wildcard host means the egress *allowlist* can't help — you can't allowlist \"everywhere.\" But `blockPrivate`\n\nis automatically set, and the `nftables`\n\ntarget enforces it in-kernel:\n\n```\ntable inet capgate {\n  chain egress {\n    type filter hook output priority 0; policy drop;\n    ip daddr { 10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16, 127.0.0.0/8, 169.254.0.0/16 } drop\n    ...\n  }\n}\n```\n\nSo command injection still runs, and still reaches the public internet — but it *cannot* pivot to `169.254.169.254`\n\n(cloud metadata), `127.0.0.1`\n\n(local services), or RFC1918 internal hosts. And capgate refuses to fake the rest: the wildcard rule shows up in an `unenforceable[]`\n\nfield with the reason *\"nftables filters IPs, not hostnames; '*' cannot be expressed as an IP allowlist. Use the 'squid' target for wildcard/hostname rules.\"* It tells you what it can't do — and where to go instead — rather than silently dropping it.\n\nThe Challenge 1 tool has no teeth at all — it reads an in-memory dictionary:\n\n``` php\n@mcp.tool()\ndef get_user_info(username: str) -> str:\n    users = {\"admin\": \"System administrator with full access\", ...}\n    return f\"User information for {username}: {users.get(username)}\"\n```\n\nThe attack isn't about what the tool *reaches*. It's about convincing the model, through injected text, to ignore its instructions. The honest manifest is empty (`\"capabilities\": []`\n\n), and capgate compiles it to the most locked-down sandbox it can produce:\n\n```\n--rm --cap-drop ALL --security-opt no-new-privileges --read-only --tmpfs /tmp --network none\n```\n\n**And the prompt injection still works, completely.** capgate constrains what a tool is allowed to *do*; it has nothing to say about whether the LLM can be *talked into* doing it. Challenges 1, 2 (tool poisoning), and 6 (indirect injection) all live at the model layer, and a capability compiler is the wrong instrument for all three. It shrinks the blast radius if those attacks then try to *reach* something — but it does not prevent the manipulation itself.\n\nAnyone who tells you a sandbox compiler stops prompt injection is selling you something. It doesn't. It makes prompt injection *less useful* by capping what the hijacked tools can touch.\n\n| # | Challenge | capgate's effect |\n|---|---|---|\n| 1 | Basic Prompt Injection | ❌ Doesn't prevent (model layer) — only caps blast radius |\n| 2 | Tool Poisoning | ❌ Doesn't prevent (model layer) — only caps blast radius |\n| 3 | Excessive Permission Scope |\n✅ Prevents — the bullseye |\n| 4 | Rug Pull | ◐ The declared capability set is the contract drift violates; `assert:` records it. No runtime enforcement in v0.0.x |\n| 5 | Tool Shadowing | — Out of scope (naming/registry) |\n| 6 | Indirect Prompt Injection | ❌ Doesn't prevent (model layer) — only caps blast radius |\n| 7 | Token Theft |\n◐ Contains — egress allowlist blocks exfil; token still readable |\n| 8 | Malicious Code Execution |\n◐ Contains — can't express shell-exec; boxes the blast radius |\n| 9 |\nRemote Access Control (cmd injection) |\n◐ Contains — blocks private ranges; can't allowlist public egress for a net tool |\n| 10 | Multi-Vector | ◐ Partial — depends on the chain |\n\n**One clean prevention. Four meaningful containments. Three honest misses. Two out-of-scope.**\n\nThat is the real shape of a capability compiler against a real adversarial corpus. It is not a silver bullet, and the cases it can't touch are exactly the cases the rest of the MCP-security stack (scanners, runtime monitors, the model's own defenses) exists to cover. capgate is one layer. It happens to be the layer that turns \"this server can reach your whole disk and the open internet\" into \"this server can reach one directory, read-only, and one host\" — and that boundary lives in a file you can review in a pull request before the server ever runs.\n\nA static scanner like NVIDIA's SkillSpector lives one layer up: its least-privilege checks would flag Challenge 3 at review time — the tool's code reaches past its declaration, which trips an \"underdeclared capability\" rule before you ever install. But flagging the mismatch and enforcing the honest declaration are different jobs. A scanner tells you the manifest is dishonest; capgate makes an honest manifest *binding* — it confirms `fs:read:/tmp/dvmcp_challenge3/public/**`\n\nwas declared, but only the compiled mount stops the tool reading the private directory anyway. You want both, and they don't substitute for each other.\n\nThe five capability manifests live in [ examples/dvmcp/](https://github.com/razukc/capgate/tree/main/examples/dvmcp) in the capgate repo. Every policy above is the\n\n`argv`\n\n/`config`\n\npayload from `capgate@0.0.3`\n\n— the CLI prints a JSON envelope (`{ \"argv\": [...], \"egress\": [...], \"notes\": [...] }`\n\n); the blocks above show the payload, and I call out the `notes[]`\n\n/`unenforceable[]`\n\nfields explicitly where they matter, because those honest edges are the point. Run it yourself from the repo root (`npm install && npm run build`\n\n):\n\n```\nnode dist/cli.js compile examples/dvmcp/challenge3-excessive-permission.json --target docker --pretty\nnode dist/cli.js compile examples/dvmcp/challenge7-token-theft.json --target egress --egress-target squid --pretty\nnode dist/cli.js compile examples/dvmcp/challenge9-command-injection.json --target egress --egress-target nftables --pretty\n```\n\nIf you run MCP servers and decide their capability boundary by hand today — a devcontainer here, a mount list there — I'd genuinely like to know where that decision lives for you, and what it costs. That's the actual open question this whole exercise is circling.", "url": "https://wpnews.pro/news/i-pointed-capgate-at-damn-vulnerable-mcp-here-s-what-it-caught-and-what-it-t", "canonical_source": "https://dev.to/kcrazy/i-pointed-capgate-at-damn-vulnerable-mcp-heres-what-it-caught-and-what-it-couldnt-52i1", "published_at": "2026-06-16 18:52:58+00:00", "updated_at": "2026-06-16 19:17:26.137144+00:00", "lang": "en", "topics": ["ai-safety", "developer-tools", "ai-agents", "ai-infrastructure", "ai-research"], "entities": ["capgate", "Damn Vulnerable MCP", "DVMCP", "Docker", "Apache-2.0", "razukc", "harishsg993010"], "alternates": {"html": "https://wpnews.pro/news/i-pointed-capgate-at-damn-vulnerable-mcp-here-s-what-it-caught-and-what-it-t", "markdown": "https://wpnews.pro/news/i-pointed-capgate-at-damn-vulnerable-mcp-here-s-what-it-caught-and-what-it-t.md", "text": "https://wpnews.pro/news/i-pointed-capgate-at-damn-vulnerable-mcp-here-s-what-it-caught-and-what-it-t.txt", "jsonld": "https://wpnews.pro/news/i-pointed-capgate-at-damn-vulnerable-mcp-here-s-what-it-caught-and-what-it-t.jsonld"}}