cd /news/ai-safety/how-to-run-untrusted-ai-agent-code-w… · home topics ai-safety article
[ARTICLE · art-17643] src=dev.to pub= topic=ai-safety verified=true sentiment=↓ negative

How to Run Untrusted AI Agent Code Without Docker

A developer has outlined a method for running untrusted AI agent code without Docker, using hardware-level isolation via Firecracker microVMs, Kata Containers, or gVisor to avoid shared kernel vulnerabilities. The approach addresses the risk of prompt injections and kernel exploits like CVE-2024-1086 and runC CVEs, which have been shown to allow frontier models to escape containers autonomously. The solution also includes egress filtering, user namespace mapping, and Falco-based detection to prevent exfiltration and silent compromise.

read4 min publishedMay 29, 2026

Docker shares the host kernel. That was always the trade. It was fine when a human read the script before it ran. It stopped being fine the second an LLM started writing code at runtime off a prompt nobody pre-screened. So here's the practitioner version: what to actually run when your agent executes code you've never seen.

The review step is gone. A model writes a script, the script lives for milliseconds, then it executes. Could be a clean chart. Could be a curl-pipe-shell because a prompt injection rewired intent four hops upstream. You don't get to read it first.

And the container under it shares one kernel with every other workload on the box. CVE-2024-1086, a netfilter use-after-free, owns every container on the host once it pops. CISA confirmed active ransomware exploitation in late 2025, years after the patch. November 2025 dropped three more under runC (CVE-2025-31133, CVE-2025-52565, CVE-2025-52881), all bypassing maskedPaths through symlink races to write procfs gadgets. Own core_pattern and the kernel runs your binary on the next coredump, as root.

In March 2026, Oxford and the UK AISI shipped SandboxEscapeBench. Frontier models reliably escaped privileged containers, writable host mounts, and exposed Docker daemons on their own. Cost per attempt: roughly a dollar. The model does the recon, picks the CVE, hands back the shell. So the fix isn't a better Docker config. It's a different boundary.

If the code came from an untrusted prompt, it doesn't belong on a shared kernel. You want a hardware boundary.

Firecracker is what AWS runs Lambda and Fargate on. Each workload gets its own dedicated kernel in a microVM, boots in ~125ms, tiny hypervisor surface. Every kernel CVE that owns Docker stops dead at the hypervisor.

pip install e2b-code-interpreter

For the jailer config: seccomp on, drop all capabilities, run as a dedicated non-root jailer user, pin CPUs so a noisy neighbor doesn't melt throughput.

Kata Containers when you need OCI image compatibility. Wraps standard images in a per-workload microVM. Pair with QEMU or Cloud Hypervisor.

spec:
  runtimeClassName: kata-qemu   # per-workload microVM

gVisor when the workload is compute-heavy and the input is trusted-ish. Modal runs it in prod for serverless GPU agents. The Sentry intercepts syscalls in userspace. It won't survive every kernel-tier exploit, but it kills the easy ones.

docker run --runtime=runsc --platform=kvm your-image

Isolation handles the local box. Egress handles exfil. Half the production sandboxes I audit ship allow-all outbound, which means a compromised agent phones home to C2 or smuggles tokens out a Markdown image tag and nobody notices.

Block everything by default. Allowlist only the endpoints the agent actually needs (the model API, the tool API). On Kata, attach the network namespace to a Cilium L7 policy that denies everything except those hosts. Tunneling, exfil, and callbacks all die at the wall when there is one.

Hardware isolation is the floor, not an excuse to run stale runC underneath it.

runc --version

Most procfs gadget writes need root on the host. User namespaces take that away. The 1.1.x line is end of life and unpatched against the November CVEs, so if you're there, you're exposed.

Isolation fails silently. Detection tells you when. Deploy Falco or Sysdig Secure with a rule for procfs symlink creation (the runC escape signature), plus rules for agent-typical anomalies: outbound TCP to non-allowlisted hosts, writes to /etc/, processes spawning nc or socat.

- rule: Create Symlink Over Procfs Files
  desc: runC container escape via procfs symlink (CVE-2025-31133 / 52565)
  condition: create_symlink and evt.arg.target in ("/proc/sysrq-trigger","/proc/sys/kernel/core_pattern")
  priority: CRITICAL

Pipe critical alerts to a channel a human reads at 3am.

Docker default is not a sandbox for model-generated code from untrusted prompts. Firecracker or Kata for hostile input, gVisor for trusted-ish compute, default-deny egress on all of it, patched runC with user namespaces underneath, Falco watching. Ship that today and you've moved the boundary from "shared kernel" to "hardware."

I wrote the full breakdown including the autonomous ROME breakout and the system-prompt contract that hardens agents against instrumental convergence over on the ToxSec Substack.

ToxSec covers AI security vulnerabilities, attack chains, and the offensive tools defenders actually need to understand. Run by an AI Security Engineer with hands-on experience at the NSA, Amazon, and across the defense contracting sector. CISSP certified, M.S. in Cybersecurity Engineering.

── more in #ai-safety 4 stories · sorted by recency
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/how-to-run-untrusted…] indexed:0 read:4min 2026-05-29 ·