{"slug": "ai-agent-permissions-the-missing-layer-between-works-and-safe", "title": "AI Agent Permissions: The Missing Layer Between \"Works\" and \"Safe\"", "summary": "AI agents that execute natural language commands on user machines pose serious security risks, including credential exfiltration, file deletion, and prompt injection attacks. Anthropic's telemetry shows users approve approximately 93% of permission prompts, creating a dangerous permission fatigue that undermines human oversight as the last line of defense.", "body_md": "#### Table of Contents\n\nIf you’re using claude code, this prompt will look very familiar to you. Coding agents can act on natural language to determine their next steps and perform commands on your screen. But their careless hands could cause disaster, and forward your credentials or delete all your prod back-ups.\n\nAs human-in-the-loop, you’re the last line of defense. How well can you tell dangerous commands from benign commands under time pressure?\n\nFind out your permission fatigue rating at [llmgame.scalex.dev](https://llmgame.scalex.dev) in just one minute and continue reading below afterwards!\n\n## The real threat\n\nYou’ve seen some threats pop up across the terminal. Luckily it was just a ~~dream~~ test. Let’s go over some of these and the real risks they pose:\n\n`rm -rf ~/`\n\n: a malformed remove file command resulting in the home directory being wiped. Sometimes linked to overeager command interpretation[“delete everything”](https://www.theguardian.com/technology/2026/apr/29/claude-ai-deletes-firm-database), sometimes results from commands becoming malformed when copied across terminals.*Credential exfiltration*(`cat ~/.aws/credentials`\n\n): Silent collection of cloud provider or SSH keys. An internal phishing campaign at Anthropic resulted in credentials being successfully exfiltrated 24 out of 25 attempts.*Scope violations*: reading and modifying files beyond the project directory scope (e.g. ~/Documents)*Prompt injection*: content copied from external websites or mails that are interpreted as commands along with the user input\n\nAnthropic just posted a write-up [on how to contain claude code](https://www.anthropic.com/engineering/how-we-contain-claude) which covers several more risks involved. Even checking out a repository could cause injection\nto happen, because the claude settings files from that repository would be loaded into the user’s claude code session. Your claude code set-up can be augmented with skills from external repositories which could be updated at any time to add malicious prompts, same for connected external MCP servers or plug-ins.\nThese allow potentially for credentials or files to be leaked, persistent threats to be installed on your machine or more. Given the threat is real, what can you realistically do to avoid them?\n\n## There must be a better way to do this?\n\nThe human-in-the-loop approach has its problems. You may have let a couple of attacks slip through your guard in the game. Anthropic’s containment post also mentions permission fatigue:\n\nOur telemetry showed users approved roughly 93% of permission prompts. The more approvals a user sees, the less attention they pay to each, becoming over time much less diligent in their supervision\n\nEven if you read every prompt perfectly, the permission model has a blind spot: the agent can edit files without approval, and then ask the user to run them in a seemingly innocent `npm run build`\n\ncommand (thanks dns_snek from [Hacker News thread!](https://news.ycombinator.com/item?id=48308376)).\n\nAnd because of the high ’noise rate’, we quickly become button mashers. To combat this, Anthropic launched [Auto mode](https://www.anthropic.com/engineering/claude-code-auto-mode). The setting can be enabled from the CLI and uses local fast-filters and a server-side scan to review tool output before it’s parsed by the local claude code agent.\nThen, prompts are evaluated again by the coding agent before execution.\n\nAuto-Mode however comes at a price. It sometimes links dangerous commands incorrectly to previous consent signals, and they report a *17% false-negative rate*. You probably had a better [high score in the game](https://llmgame.scalex.dev), but that’s hard to keep up all day and glues you to the terminal.\n\n### Destructive call Hooks\n\nClaude code allows you to set up PreToolUse hooks, and the agent will recognize when certain commands should trigger a hook and then read and execute the context from that hook file before performing the actual commands.\nYou can find some examples online where this is set-up to block `rm -rf /`\n\nand the like. These work as a blocklist, so they are not fool-proof. Adversaries can work around blocklists by obfuscating the commands (`echo \"ZWNobyAiY291bGQgaGF2ZSBiZWVuIHJtIC1yZiAvICI=\" | base64 -d | bash`\n\nas an obvious example).\n\nClaude Code also has a built-in [sandbox](https://code.claude.com/docs/en/sandboxing) mode which you can configure with `/sandbox`\n\n. It allows writes only to the working directory, prompts for each new network domain and blocks filesystem access outside the working directory for bash commands.\n\nHooks add a second layer for patterns the sandbox doesn’t cover. Claude will also happily generate some for you if you ask it, so you’re not relying on some random skill file from the internet that can pose another attack vector.\n\n### I also like to live –dangerously-skip-permissions\n\nAn alternative approach is to sandbox your agent, and use `--dangerously-skip-permissions`\n\nwhich will never prompt for permissions. The [containment blog post](https://www.anthropic.com/engineering/how-we-contain-claude) from Anthropic covers these elements:\n\n- a hypervisor to create the sandbox\n- a proxy to intercept calls and inspect them for exfiltration risks\n\nYou can use a hosted agent provider (like [Claude Code on web](https://code.claude.com/docs/en/claude-code-on-the-web)) or contain your own claude code agent locally. Anthropic posted instructions on [how to install devcontainers here](https://code.claude.com/docs/en/devcontainer).\nThese containers will separate your host system from the devcontainer, but the risk of exfiltrating data the container has access to (such as credentials) still remains. Restrict the credentials you provide to it, so it can’t drop your production database.\n\n## In conclusion\n\nIt’s a whole new world with a new set of attack vectors. It’s best to remain aware of the risks and how to mitigate them.\nIf you don’t use them already, try out devcontainers (locally or on the cloud), sandbox, auto mode and hooks, to minimize your exposure to these risks. Running `--dangerously-skip-permissions`\n\nwithout any of the accompanying guardrails makes you all the more vulnerable.\nFor Claude, there’s a useful comparison of all the different sandboxing modes [available here](https://code.claude.com/docs/en/sandbox-environments).\n\nWhich approaches are you running, and what was your high score?", "url": "https://wpnews.pro/news/ai-agent-permissions-the-missing-layer-between-works-and-safe", "canonical_source": "https://scalex.dev/blog/ai-agent-permissions/", "published_at": "2026-05-29 07:29:45+00:00", "updated_at": "2026-05-29 07:47:16.341284+00:00", "lang": "en", "topics": ["ai-agents", "ai-safety", "ai-policy", "ai-ethics", "large-language-models"], "entities": ["Claude", "Anthropic", "llmgame.scalex.dev", "The Guardian"], "alternates": {"html": "https://wpnews.pro/news/ai-agent-permissions-the-missing-layer-between-works-and-safe", "markdown": "https://wpnews.pro/news/ai-agent-permissions-the-missing-layer-between-works-and-safe.md", "text": "https://wpnews.pro/news/ai-agent-permissions-the-missing-layer-between-works-and-safe.txt", "jsonld": "https://wpnews.pro/news/ai-agent-permissions-the-missing-layer-between-works-and-safe.jsonld"}}