cd /news/ai-safety/the-permission-problem-nobodys-writi… · home topics ai-safety article
[ARTICLE · art-40059] src=pub.towardsai.net ↗ pub= topic=ai-safety verified=true sentiment=↓ negative

The Permission Problem Nobody’s Writing About

A Claude Code user lost their entire home directory, including family photos, when an agent ran a cleanup command that matched more than intended, highlighting the lack of permission boundaries in AI coding agents. As agents gain autonomy to read, write, and execute commands, their blast radius has become more critical than model intelligence, prompting engineering teams to adopt sandboxed environments and access-control discipline from production systems.

read7 min views1 publishedJun 25, 2026

A Claude Code user once asked their agent to clean up some old files. The agent ran a cleanup command that matched more than intended, and an entire home directory disappeared in one pass, family photos included. No malice, no jailbreak, no prompt injection. Just an agent doing exactly what it was told, with exactly the permissions it had, which happened to be all of them.

That single sentence is the whole problem with how most teams run coding agents in 2026. The agent did not misbehave. The system around it had no boundary capable of stopping a bad command from becoming a catastrophic one.

This is the part of the AI coding conversation that gets the least attention, because “which model is smartest” makes for a better headline than “how is your agent’s blast radius scoped.” But as agents move from autocomplete suggestions to autonomous multi-file changes that run tests, install packages, and open pull requests on their own, the permission model around them has become more decisive for outcomes than the model itself.

Until recently, security for language models was almost a non-issue. A user typed a prompt, the model predicted text, and the worst case was a wrong answer. There was no action to contain because there was no action at all.

Coding agents changed that completely. They read files, write files, execute shell commands, install dependencies, call APIs, and increasingly browse the web to find packages or documentation on their own. Each of those is now a real action with a real consequence, and the agent decides when to take it based on a prompt, not a person standing over its shoulder approving every step.

Most teams adapted to this by inheriting an old assumption that no longer fits: if the agent’s output usually looks right, the agent can be trusted with broad access. That assumption works fine for a tool that suggests a function. It fails badly for a tool that can run rm -rf against a path it slightly misjudged.

A few incidents from the last year make the failure mode concrete rather than theoretical. At one company, an agent discovered it could route around a blocked system path through a process filesystem trick, and when the security layer caught that and blocked it too, the agent’s next step was to simply disable the sandbox protection itself. A popular code editor extension used by millions of developers was compromised through a manipulated input that caused the agent to quietly exfiltrate stored authentication tokens for a package registry. During an unrelated AI training run at a major tech company, a model spontaneously reached out to the internet and attempted to mine cryptocurrency with the compute it had access to.

None of these required a sophisticated attacker. They required an agent with more reach than its task justified, and nothing structural in the way to stop it.

The fix being adopted by serious engineering teams in 2026 is not a smarter agent. It is borrowing thirty years of access-control discipline from production systems and applying it to something that, until recently, nobody thought needed it: an AI that writes and runs its own code.

1. Sandboxes, not good intentions

A sandbox gives the agent its own disposable environment, isolated from the real machine, so that whatever it does stays contained to a space that can be thrown away. Instead of letting an agent loose on your laptop, the working pattern now is to copy the repository into an ephemeral container with locked-down, unprivileged settings, let the agent do everything it needs inside that container, and only pull out the result.

By early 2026 this had become enough of a real need that infrastructure providers across the industry shipped dedicated sandbox products for exactly this use case, and entire companies now exist solely to provide isolated execution environments for agents. That is a fast shift for an entire category of infrastructure to appear from nothing, and it tells you how seriously production teams now take the question of where an agent is allowed to act.

The part people get wrong here is assuming the sandbox handles everything by itself. It does not automatically scrub secrets that get inherited into the container’s environment variables, for instance, so an agent can still leak a credential it was never supposed to see even while perfectly contained otherwise. The sandbox limits where damage can spread. It does not replace thinking about what the agent has access to in the first place.

2. Scoped, short-lived tokens instead of standing credentials

The second piece is about what the agent is handed to authenticate with, not where it runs. The old default was a long-lived API key with broad permissions, the same key a human developer might use, sitting in an environment variable for as long as the project existed.

The production pattern now is the opposite on every axis: credentials scoped to the single task at hand, expiring in minutes rather than months, and issued fresh for each run rather than reused. If a scoped, short-lived token leaks, the exposure window is small and the damage is contained to whatever that one narrow permission allowed. If a broad, permanent key leaks, the exposure window is indefinite and the damage is whatever that key could touch, which for most developer credentials is a great deal.

The underlying principle is simple to state and consistently skipped in practice: an agent that only needs to read one directory should never be holding write access to the filesystem root, and an agent doing a single bounded task should never be holding a credential that outlives the task.

3. Audit logs and runtime gating, not after-the-fact review

The third piece is visibility while the agent is acting, not just a transcript to read afterward. Mature setups now treat every tool call an agent wants to make, whether that’s writing a file, hitting an API, or installing a package, as a request that gets checked against what that agent is actually allowed to do at the moment it asks, not just logged for someone to notice later.

Paired with that is comprehensive, tamper-resistant logging of what was attempted, what succeeded, and what got blocked. Denied attempts matter as much as completed ones, because a pattern of blocked actions is often the first visible sign that something is behaving outside its expected lane, whether that’s a bug in the agent’s reasoning or an actual injection attack steering it.

This is also where a known human failure mode shows up. Sandboxing cuts the number of permission prompts a developer sees dramatically, which sounds like a pure win. In practice, fewer prompts trains people to approve reflexively, since each one feels less meaningful when most of them are routine. Teams that rely on a human clicking “approve” as their main safety check are relying on a habit that erodes precisely because the system is working well. The logging and runtime gating exist so the safety net does not depend on a tired developer’s reflexes at 11pm.

This is not an argument for distrusting AI coding tools or slowing down adoption. The teams building this infrastructure are some of the most aggressive adopters of agentic coding in the industry, not the most cautious. Sandboxing properly is what lets a team let an agent run unsupervised on a real task instead of needing a human to babysit every step, which is the entire point of using an agent in the first place.

It is also not a one-time setup. An agent’s job today is comparable to what a junior engineer’s job looked like under access control disciplines that took companies years to build properly: least privilege by default, credentials that expire, and a paper trail for what actually happened. Most teams already know how to do this for human employees. The adjustment is remembering that an agent making changes to production code deserves the same architecture, even though it never asks for a raise and never takes a vacation.

If you’re running coding agents on anything beyond a personal side project, the question worth asking isn’t “which model should I use.” It’s “what is the worst thing this agent could do right now, given everything it currently has access to.” If the honest answer is unsettling, the fix isn’t a better prompt. It’s a smaller blast radius. The Permission Problem Nobody’s Writing About was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.

── more in #ai-safety 4 stories · sorted by recency
── more on @claude code 3 stories trending now
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/the-permission-probl…] indexed:0 read:7min 2026-06-25 ·