How We Securely Serve a Large Agent Fleet on a Small Infra Footprint GluonDB has developed a method to securely serve a large fleet of persistent agents on a small infrastructure footprint by decoupling orchestration, filesystem, and sandboxed execution. The company argues against the common practice of giving every agent its own virtual machine, which leads to high costs and inefficiency for always-on business agents that do not require constant execution. The agent landscape changed fast. A year ago, most "agents" were chat apps with a tool call. Today, the useful version is closer to a persistent worker: something that remembers, wakes up on a schedule, reads from real systems, writes reports, notices changes, and sometimes executes code. If an agent is only a chat session, you can serve it like a request. If an agent is a worker, you have to decide what stays alive when nobody is watching: files, memory, tools, schedules, or a whole sandbox. Our answer at gluonDB is simple: The sandbox is not the agent. The VM should not be the unit of identity. The filesystem should not be inseparable from code execution. And the orchestration layer should not live inside the same environment it is asking untrusted model output to manipulate. We run a large fleet of persistent agents on a small infra footprint because we split the problem into three parts that are usually bundled together: - Orchestration - Filesystem - Sandboxed execution The VM Is the Wrong Default Most agent infrastructure ends up in one of two conversations. One is orchestration: frameworks like LangGraph https://docs.langchain.com/oss/python/langgraph/overview , the Vercel AI SDK https://ai-sdk.dev/docs/agents/overview , and the OpenAI Agents SDK https://openai.github.io/openai-agents-python/agents/ help developers build loops, tools, state, handoffs, and guardrails. The other is execution: Firecracker https://firecracker-microvm.github.io/ made lightweight microVMs a serious primitive, Fly Machines https://fly.io/docs/reference/architecture/ run apps in Firecracker microVMs, and E2B https://e2b.dev/ gives agents isolated sandboxes for untrusted code. Both matter. The mistake is when they collapse into one default: Give every agent a sandbox and make that sandbox its computer. That default makes sense for coding agents. If the whole product is editing repos, running tests, installing packages, and starting servers, then keeping the agent close to a shell is natural. But data agents, reporting agents, monitoring agents, and most always-on business agents are different. They need durable working state much more often than they need live execution. They do not need a VM burning CPU and memory all day waiting for the rare moment when the model decides to run npm install . Files Are Not Execution The subtle mistake is binding the filesystem to the sandbox. Once you do that, the sandbox quietly becomes the agent's identity. The agent's files live there. Its scratch space lives there. Its process lives there. The harness often lives there too. Now shutting down the sandbox does not feel like stopping execution. It feels like stopping the agent. So teams keep sandboxes warm. Costs rise. They add pooling, but the pool is still fighting the wrong abstraction if every agent is treated as the owner of a sandbox. Then come snapshots, lifecycle policy, cleanup jobs, and image management. Eventually they are building a small cloud provider because they wanted an agent that could write a weekly report. The agent should have durable identity and durable files without owning a running execution environment. Idle Machines Get Expensive Always-on sandbox-per-agent infrastructure prices the system around the wrong bottleneck. Agent loops are not free, but the expensive part is pretending every persistent agent needs a persistent machine. If an agent is actively coding all day, fine. Keep a machine close. But if an agent checks a database every morning, writes a report, answers questions, watches for anomalies, and occasionally runs a shell command, the VM is idle most of the time. The industry answer is often "microVMs are cheap." True. Firecracker is excellent. But cheap is not free. If you self-host microVMs, you inherit KVM, networking, images, snapshots, density, cleanup, and host constraints. Nested virtualization has improved, including on AWS https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/amazon-ec2-nested-virtualization.html , but it is still a thing you have to understand and operate. For us, that was the wrong surface area. We wanted the agent to stay alive as an identity, not as a VM. The Harness Should Stay Outside The agent harness is the control system. It decides what tools exist, what credentials are available, what memory gets loaded, and what work is allowed to happen. Putting that harness inside the same sandbox used for arbitrary execution is a strange default. Yes, you can harden it. But you have still moved the most sensitive part of the system into the place where the model is allowed to poke around. Our preference is stricter: the sandbox should contain risky execution, not own the agent. If the model needs to run code, it gets an execution environment. When the code is done, that environment can go cold. The agent's identity, memory, tools, credentials, and lifecycle stay outside. Split the Agent From Its Computer At gluonDB, we split orchestration, filesystem, and execution into separate layers. 1. The agent control plane The control plane owns the agent loop, sessions, memory loading, cron jobs, channels, budgets, model configuration, and tool registry. It does not need a local filesystem to do that. The agent can be persistent without being a process inside a per-agent VM. This is what lets a gluonDB agent behave like a durable worker without first asking, "is there a VM alive?" 2. Durable scoped workspaces The filesystem layer gives every agent durable, scoped workspaces. This is the key distinction. The filesystem is not an incidental side effect of the sandbox. It is its own layer. Most file operations are not dangerous in the same way code execution is dangerous. Reading a file, writing a report, editing a markdown note, saving an HTML dashboard, or applying a patch needs authorization, auditability, size limits, and path safety. It does not always need a VM. By making the filesystem a service, agents keep durable working state even when no execution environment is running. A sleeping agent still has its files. A scheduled agent still has its memory. A dashboard generated yesterday still exists tomorrow. 3. A lazy gVisor sandbox pool When an agent actually needs to run code or shell commands, the filesystem layer routes that work into a sandboxed execution tier. We chose gVisor https://gvisor.dev/docs/ for this layer because it gives us the isolation properties we need without making us run a Firecracker control plane or depend on nested virtualization for our deployment model. The important concept is the lazy sandbox pool. Agents do not own sandboxes. They borrow them. When an agent crosses from "file and tool work" into "execute code," it gets a sandbox from the pool. The sandbox is temporarily bound to that agent's active workspace, runs the command, and stays warm while the agent is still doing execution-heavy work. When that burst ends, the sandbox is released for another agent. That means one sandbox can serve many agents over time. The filesystem remains durable. The control plane remains awake. The execution environment is only occupied during the slice of time when real execution is happening. The agent does not lose flexibility. It can still execute code when code is the right tool. The difference is that code execution becomes an on-demand capability, not the default shape of the whole agent. That keeps the system closer to the scaling profile of a standard web app: many durable users, sessions, files, jobs, and requests sharing a smaller pool of expensive compute surfaces. This is why running a large persistent agent fleet on a small infra footprint is not magic. We are not keeping a full execution environment alive for every agent. We are keeping agents alive as orchestrated identities with durable files and scoped tools. Sandboxes appear only when execution is actually needed. How the Layers Meet When a new agent is provisioned, the gateway prepares its home workspace, grants only the access it needs, and hands the control plane the configuration for that agent. From there, each layer does one job. The control plane runs the agent. It knows the home workspace, active project workspace, channels, model, and budget. It is the long-lived part. The workspace layer handles reads, writes, edits, searches, and saved artifacts directly from durable storage. No sandbox needs to be running for ordinary file work. When the agent needs shell execution, the workspace service checks out a sandbox from the lazy pool, mounts the workspace into it, runs the command, and releases the sandbox when the execution burst is over. For data access, the agent does not get database credentials dumped into its workspace. It goes through a scoped tool surface, with permissions enforced outside the workspace. No component has to pretend to be all three things. The orchestrator is not the filesystem. The filesystem is not the sandbox. The sandbox is not the agent. This is not an anti-VM argument. If you are building a coding agent, or a platform whose whole purpose is safe arbitrary code execution, a VM-shaped product can be exactly right. But for always-on data agents and business agents, treating a VM as the default unit of identity is the wrong starting point. Agents need durable state. They need scoped tools. They need memory. They need scheduling. They need files. Sometimes they need a sandbox. Those are not the same thing. That is the bet we made with gluonDB: keep the agent alive without keeping its machine alive. Split orchestration, filesystem, and execution. Keep the flexibility of code execution, but scale the common path like a web application. Let sandboxes be borrowed, not owned. That is how we serve a large agent fleet on a small infra footprint.