{"slug": "safe-ways-to-use-ai-agents", "title": "Safe Ways to Use AI Agents", "summary": "Renuo developers identified security risks in AI coding agents like Claude Code and OpenCode, including credential exfiltration and destructive actions via prompt injection, after incidents such as unauthorized API key access and database deletions. They propose mitigation strategies to balance developer experience with security.", "body_md": "At [Renuo](https://www.renuo.ch) we started using AI coding agents (like [Claude Code](https://code.claude.com/docs/), [OpenCode](https://opencode.ai/) or\n[Antigravity](https://antigravity.google/)) for development, and I also started using them for personal\nprojects like my [Raspberry Dashboard](https://github.com/rnestler/raspberry-dashboard). Additionally we started building our own\nAI agent which is integrated in [Redmine](https://www.redmine.org/), our ticketing and project\nmanagement system.\n\nDuring this, we became aware of the security risks involved: By default these agents run with full user permissions: They can read and write files, execute commands, and access credentials on the host system.\n\nJohann Rehberger's 39c3 talk [Agentic ProbLLMs: Exploiting AI Computer-Use\nand Coding Agents](https://media.ccc.de/v/39c3-agentic-probllms-exploiting-ai-computer-use-and-coding-agents) shows how prompt injection can lead to remote\ncode execution and credential exfiltration in agents like Claude Code and\nGitHub Copilot. I recommend watching it.\n\nIn this post we'll take a look at these risks and the pragmatic solutions we came up with to keep a balance between developer experience and security.\n\n# Risks of AI Agents\n\nLLMs are probabilistic -- a 1% chance of disaster makes it a matter of when, not if. --\n\n[Agent Safehouse]\n\nMost AI coding agents run with the same permissions as the user who started them. They have access to the file system, can execute arbitrary shell commands, and inherit all credentials available in the environment. Since LLMs are susceptible to prompt injection (malicious instructions hidden in code, documentation, or web content), this creates a real attack surface.\n\nThe risks boil down to a few categories:\n\n-\n**Exposing credentials**: Agents have access to environment variables, config files, and credential stores. A prompt injection can trick an agent into exfiltrating API keys or access tokens to an attacker-controlled server. -\n**Malware installation**: Agents can be tricked into downloading and executing malicious code, for example through poisoned dependencies or malicious instructions in README files. -\n**Destructive actions on the local machine**: An agent might delete files, overwrite configurations, or corrupt a local database -- through prompt injection or simply by making a wrong decision. -\n**Destructive actions on remote systems**: Agents often have access to CLI tools that can interact with production infrastructure. Think`kubectl delete`\n\n,`terraform destroy`\n\n,`nctl delete`\n\n, or maybe even simply a database client connected to prod.\n\nI asked some developers what incidents they had happen with AI agents so far. Here are a few examples:\n\nI don't have the session anymore, but while working on a redmine integration, it found out that I had a REDMINE_API_KEY in my ENV variables and started fetching data from our production redmine.\n\n-- Alessandro Rodi\n\nWhile it wasn't a major issue, it was frustrating when database migration errors caused the development database to be deleted and recreated, as I often lost test-data I wanted to keep.\n\n-- Bruno Costanzo\n\nWhile I was testing our claude code skill to deploy web-apps to deplo.io, the agent hit the quota limit of the number of apps in the test organization. To solve this it decided it's best to delete existing apps with\n\n`nctl delete app`\n\n. It did ask for confirmation though before going ahead.-- Josua Schmid\n\nWe didn't have a case yet where things went seriously wrong, mostly because we don't let the agents run unattended and use test environments. But it was enough to trigger us to really think about how to improve the situation.\n\nFor a deeper look at these attack vectors see the [39c3 talk](https://media.ccc.de/v/39c3-agentic-probllms-exploiting-ai-computer-use-and-coding-agents)\nreferenced in the introduction.\n\n# Mitigation Strategies\n\nSo what can we do about this? It boils down to the following strategies:\n\n**Hope**: Instruct the agent not to do destructive things.** Manual approval**: Configure the agents to ask before everything.** Agent specific configuration**: Disallow the agent to read certain files or execute certain commands.** Isolation**: Run the agents in VMs, Docker containers or a sandboxing tool.\n\n## Hope / Prompt Begging\n\nThe major issue with just asking the LLM not to do destructive things via prompting is that it may just not work.\n\n## Manual approval\n\nWhile manually approving everything the agent does sounds secure, in practice it\nleads to **approval fatigue**: repeatedly approving actions causes us to pay less\nattention to what we're actually approving.\n\nIt also kills productivity: Constant interruptions prevent agents from running in the background.\n\nThere is also the issue of over-permissive allowing: At one point while trying\nout Antigravity, I accidentally allowed executing *every* bash command instead\nof only the one it had requested. Since the agent then just continued executing\nstuff, I needed to stop it.\n\n## Agent specific configuration\n\nMost agents can be configured to allow and deny patterns of actions. [Claude\nCode's permission system](https://code.claude.com/docs/en/permissions) for example allows you to pattern match shell commands:\n\n```\n{\n  \"permissions\": {\n    \"allow\": [\n      \"Bash(git commit *)\"\n    ],\n    \"deny\": [\n      \"Bash(git push *)\"\n    ]\n  }\n}\n```\n\nThis will allow `git commit`\n\nbut block `git push`\n\ncommands.\n\n[OpenCode's permission system](https://opencode.ai/docs/permissions/) works similarly:\n\n```\n{\n  \"$schema\": \"https://opencode.ai/config.json\",\n  \"permission\": {\n    \"bash\": {\n      \"git commit *\": \"allow\",\n      \"git push *\": \"deny\"\n    }\n  }\n}\n```\n\nThe issues with these systems are:\n\n**Agent-specific**: There is no way to specify rules across all agents.** Deny-lists can't be exhaustive**: You may specify a deny rule like`Read(.env)`\n\n, but the agent can access the same file through a bash tool:`cat .env`\n\n,`grep . .env`\n\n,`python -c \"print(open('.env').read())\"`\n\n, and so on. Deny-lists fundamentally can't cover the infinite ways to access a resource.**Easily overridden**: The rules live in the repository itself and may be modified during normal usage. Claude Code, for example, creates`.claude/settings.local.json`\n\nand adds it to your*global*`~/.config/git/ignore`\n\n. So changes to that file won't even show up in`git status`\n\n. And other agents will simply ignore these config files entirely.\n\n## Isolation\n\nTo really provide protection we need to externally sandbox AI agents using operating system level protection. But completely isolating agents makes them also completely useless: In the end you want them to work on your code and interact in some way with the environment. The following table should illustrate the level of isolation provided:\n\n| Technique | Properties | Isolation Level |\n|---|---|---|\n| VM | Full isolation. Only interaction via virtual network | Very High |\n| Containers | Kernel namespaces, cgroups. Limited access to files, processes and other resources | High |\n| Sandboxes | Landlock LSM / Seatbelt | Configuration Dependent |\n\nWhat sandboxing technique to use also depends on the use case:\n\n**Local Agents**: You want to give them access to your local project and probably some development tools. So a separate VM or even a Docker container needs a lot of setup.**Standalone Agent**: It will run on its own server / VM anyways. Further you want to have it run in a reproducible environment and not mess with the operating system. So running it inside a container makes sense.\n\nHere we'll focus on **Local Agents** as used by developers on their own\nmachines.\n\n### Devcontainers\n\nIf you are using [Devcontainers](https://containers.dev/) for your development needs, a quick way to give\nthe agent a somewhat isolated environment is to run it inside that container as\nwell. In an open-source project I help maintain we recently added exactly that:\n[https://github.com/gfroerli/api/pull/356](https://github.com/gfroerli/api/pull/356)\n\nBut at Renuo we rarely use Devcontainers for our setups: We prefer local environments which are easier to debug and inspect.\n\n### Builtin Sandboxing\n\nSome AI agents support sandboxing in their own runtime. See [Claude Code\nSandboxing](https://code.claude.com/docs/en/sandboxing) for example. The downsides here are again:\n\n- Agent-specific\n- Hard to get the configuration right\n[1](#fn:1) - At least for Claude Code it only affects the Bash tool, not the Read and Write tools!\n\n### Specialized Sandboxing Tools\n\nIn the end we settled on the following two tools which use kernel level sandboxing to limit what agents can do:\n\n: Uses macOS[Agent Safehouse](https://agent-safehouse.dev/)[Seatbelt](https://theapplewiki.com/wiki/Dev:Seatbelt)to only give the agent access to what it really needs.: Uses[nono](https://nono.sh/)[Landlock](https://landlock.io/)on Linux and[Seatbelt](https://theapplewiki.com/wiki/Dev:Seatbelt)on macOS. It doesn't just provide kernel isolation, but also undo & rollback, audit trail, supply chain provenance, runtime supervision and environment variable filtering.\n\nThese tools have the following characteristics:\n\n- They enforce irrevocable allow-list based blocking at the kernel level\n- They work with all agents\n- They use sane defaults to protect credentials on your system\n- Configuration is separate from the agent and can be separately tested:\n\nBy now we've already limited a lot of what an agent can do. But one thing\nremains: Credentials. The agent can still access credentials which are in the\nproject folder (like a `.env`\n\nfile) or which are stored in environment\nvariables (this may be a good point to check if you have some tokens stored in\nyour `~/.bashrc`\n\n, `~/.profile`\n\n, `~/.zshrc`\n\nor wherever. We really shouldn't,\nbut sometimes we developers are lazy...).\n\nNono comes with a nice way to [filter environment variables](https://nono.sh/docs/cli/features/environment) 2. This isn't\nenabled by default, but we can easily create a custom profile that filters\nenvironment variables:\n\nNow `claude`\n\nwill only have access to the safe subset of environment variables\nthat we allow it:\n\n## Summary\n\nIn the end it is, as usual, a tradeoff between developer convenience and security: Giving the AI agent access to everything is the most convenient, but a recipe for disaster. At Renuo we came up with the following rough guidelines:\n\n- For custom agents running independently: Use VMs, Docker containers and limit the availability of credentials as much as possible. It's better to let the AI agent do its task and then let your custom runtime push code, respond to tickets etc.\n- For developer machines: Use sandboxing tools with out-of-the-box profiles\nfor common AI agents. Still don't let them run in\n`--yolo`\n\n/ unattended mode: Keep yourself in the loop, but you can be more permissive in what commands you allow them to run without confirmation. Take special care of credentials that are exposed via environment variables.\n\n-\nOn one occasion while testing the sandboxing feature of Claude I had it assure me that its access to a file was blocked by the sandbox, even if the sandbox couldn't be started because of missing system dependencies!\n\n[↩](#fnref:1) -\nWhich got implemented quite quickly after I proposed it:\n\n[https://github.com/always-further/nono/issues/688](https://github.com/always-further/nono/issues/688)[↩](#fnref:2)", "url": "https://wpnews.pro/news/safe-ways-to-use-ai-agents", "canonical_source": "https://blog.rnstlr.ch/safe-ways-to-use-ai-agents.html", "published_at": "2026-06-29 09:49:15+00:00", "updated_at": "2026-06-29 09:58:06.461430+00:00", "lang": "en", "topics": ["ai-agents", "ai-safety", "large-language-models", "ai-ethics", "ai-research"], "entities": ["Renuo", "Claude Code", "OpenCode", "Antigravity", "Redmine", "Johann Rehberger", "GitHub Copilot", "Raspberry Dashboard"], "alternates": {"html": "https://wpnews.pro/news/safe-ways-to-use-ai-agents", "markdown": "https://wpnews.pro/news/safe-ways-to-use-ai-agents.md", "text": "https://wpnews.pro/news/safe-ways-to-use-ai-agents.txt", "jsonld": "https://wpnews.pro/news/safe-ways-to-use-ai-agents.jsonld"}}