{"slug": "how-we-securely-serve-a-large-agent-fleet-on-a-small-infra-footprint", "title": "How We Securely Serve a Large Agent Fleet on a Small Infra Footprint", "summary": "GluonDB has developed a method to securely serve a large fleet of persistent agents on a small infrastructure footprint by decoupling orchestration, filesystem, and sandboxed execution. The company argues against the common practice of giving every agent its own virtual machine, which leads to high costs and inefficiency for always-on business agents that do not require constant execution.", "body_md": "The agent landscape changed fast.\n\nA year ago, most \"agents\" were chat apps with a tool call. Today, the useful version is closer to a persistent worker: something that remembers, wakes up on a schedule, reads from real systems, writes reports, notices changes, and sometimes executes code.\n\nIf an agent is only a chat session, you can serve it like a request. If an agent is a worker, you have to decide what stays alive when nobody is watching: files, memory, tools, schedules, or a whole sandbox.\n\nOur answer at gluonDB is simple:\n\n**The sandbox is not the agent.**\n\nThe VM should not be the unit of identity. The filesystem should not be inseparable from code execution. And the orchestration layer should not live inside the same environment it is asking untrusted model output to manipulate.\n\nWe run a large fleet of persistent agents on a small infra footprint because we split the problem into three parts that are usually bundled together:\n\n- Orchestration\n- Filesystem\n- Sandboxed execution\n\n## The VM Is the Wrong Default\n\nMost agent infrastructure ends up in one of two conversations. One is orchestration: frameworks like [LangGraph](https://docs.langchain.com/oss/python/langgraph/overview), the [Vercel AI SDK](https://ai-sdk.dev/docs/agents/overview), and the [OpenAI Agents SDK](https://openai.github.io/openai-agents-python/agents/) help developers build loops, tools, state, handoffs, and guardrails. The other is execution: [Firecracker](https://firecracker-microvm.github.io/) made lightweight microVMs a serious primitive, [Fly Machines](https://fly.io/docs/reference/architecture/) run apps in Firecracker microVMs, and [E2B](https://e2b.dev/) gives agents isolated sandboxes for untrusted code.\n\nBoth matter. The mistake is when they collapse into one default:\n\n**Give every agent a sandbox and make that sandbox its computer.**\n\nThat default makes sense for coding agents. If the whole product is editing repos, running tests, installing packages, and starting servers, then keeping the agent close to a shell is natural.\n\nBut data agents, reporting agents, monitoring agents, and most always-on business agents are different. They need durable working state much more often than they need live execution.\n\nThey do not need a VM burning CPU and memory all day waiting for the rare moment when the model decides to run `npm install`\n\n.\n\n## Files Are Not Execution\n\nThe subtle mistake is binding the filesystem to the sandbox.\n\nOnce you do that, the sandbox quietly becomes the agent's identity. The agent's files live there. Its scratch space lives there. Its process lives there. The harness often lives there too. Now shutting down the sandbox does not feel like stopping execution. It feels like stopping the agent.\n\nSo teams keep sandboxes warm. Costs rise. They add pooling, but the pool is still fighting the wrong abstraction if every agent is treated as the owner of a sandbox. Then come snapshots, lifecycle policy, cleanup jobs, and image management. Eventually they are building a small cloud provider because they wanted an agent that could write a weekly report.\n\nThe agent should have durable identity and durable files without owning a running execution environment.\n\n## Idle Machines Get Expensive\n\nAlways-on sandbox-per-agent infrastructure prices the system around the wrong bottleneck.\n\nAgent loops are not free, but the expensive part is pretending every persistent agent needs a persistent machine.\n\nIf an agent is actively coding all day, fine. Keep a machine close. But if an agent checks a database every morning, writes a report, answers questions, watches for anomalies, and occasionally runs a shell command, the VM is idle most of the time.\n\nThe industry answer is often \"microVMs are cheap.\" True. Firecracker is excellent. But cheap is not free. If you self-host microVMs, you inherit KVM, networking, images, snapshots, density, cleanup, and host constraints. [Nested virtualization has improved, including on AWS](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/amazon-ec2-nested-virtualization.html), but it is still a thing you have to understand and operate.\n\nFor us, that was the wrong surface area. We wanted the agent to stay alive as an identity, not as a VM.\n\n## The Harness Should Stay Outside\n\nThe agent harness is the control system. It decides what tools exist, what credentials are available, what memory gets loaded, and what work is allowed to happen.\n\nPutting that harness inside the same sandbox used for arbitrary execution is a strange default.\n\nYes, you can harden it. But you have still moved the most sensitive part of the system into the place where the model is allowed to poke around.\n\nOur preference is stricter: the sandbox should contain risky execution, not own the agent.\n\nIf the model needs to run code, it gets an execution environment. When the code is done, that environment can go cold. The agent's identity, memory, tools, credentials, and lifecycle stay outside.\n\n## Split the Agent From Its Computer\n\nAt gluonDB, we split orchestration, filesystem, and execution into separate layers.\n\n### 1. The agent control plane\n\nThe control plane owns the agent loop, sessions, memory loading, cron jobs, channels, budgets, model configuration, and tool registry.\n\nIt does not need a local filesystem to do that. The agent can be persistent without being a process inside a per-agent VM.\n\nThis is what lets a gluonDB agent behave like a durable worker without first asking, \"is there a VM alive?\"\n\n### 2. Durable scoped workspaces\n\nThe filesystem layer gives every agent durable, scoped workspaces.\n\nThis is the key distinction. The filesystem is not an incidental side effect of the sandbox. It is its own layer.\n\nMost file operations are not dangerous in the same way code execution is dangerous. Reading a file, writing a report, editing a markdown note, saving an HTML dashboard, or applying a patch needs authorization, auditability, size limits, and path safety. It does not always need a VM.\n\nBy making the filesystem a service, agents keep durable working state even when no execution environment is running. A sleeping agent still has its files. A scheduled agent still has its memory. A dashboard generated yesterday still exists tomorrow.\n\n### 3. A lazy gVisor sandbox pool\n\nWhen an agent actually needs to run code or shell commands, the filesystem layer routes that work into a sandboxed execution tier.\n\nWe chose [gVisor](https://gvisor.dev/docs/) for this layer because it gives us the isolation properties we need without making us run a Firecracker control plane or depend on nested virtualization for our deployment model.\n\nThe important concept is the lazy sandbox pool.\n\nAgents do not own sandboxes. They borrow them.\n\nWhen an agent crosses from \"file and tool work\" into \"execute code,\" it gets a sandbox from the pool. The sandbox is temporarily bound to that agent's active workspace, runs the command, and stays warm while the agent is still doing execution-heavy work. When that burst ends, the sandbox is released for another agent.\n\nThat means one sandbox can serve many agents over time. The filesystem remains durable. The control plane remains awake. The execution environment is only occupied during the slice of time when real execution is happening.\n\nThe agent does not lose flexibility. It can still execute code when code is the right tool. The difference is that code execution becomes an on-demand capability, not the default shape of the whole agent.\n\nThat keeps the system closer to the scaling profile of a standard web app: many durable users, sessions, files, jobs, and requests sharing a smaller pool of expensive compute surfaces.\n\nThis is why running a large persistent agent fleet on a small infra footprint is not magic. We are not keeping a full execution environment alive for every agent. We are keeping agents alive as orchestrated identities with durable files and scoped tools. Sandboxes appear only when execution is actually needed.\n\n## How the Layers Meet\n\nWhen a new agent is provisioned, the gateway prepares its home workspace, grants only the access it needs, and hands the control plane the configuration for that agent.\n\nFrom there, each layer does one job.\n\nThe control plane runs the agent. It knows the home workspace, active project workspace, channels, model, and budget. It is the long-lived part.\n\nThe workspace layer handles reads, writes, edits, searches, and saved artifacts directly from durable storage. No sandbox needs to be running for ordinary file work.\n\nWhen the agent needs shell execution, the workspace service checks out a sandbox from the lazy pool, mounts the workspace into it, runs the command, and releases the sandbox when the execution burst is over.\n\nFor data access, the agent does not get database credentials dumped into its workspace. It goes through a scoped tool surface, with permissions enforced outside the workspace.\n\nNo component has to pretend to be all three things.\n\nThe orchestrator is not the filesystem.\n\nThe filesystem is not the sandbox.\n\nThe sandbox is not the agent.\n\nThis is not an anti-VM argument.\n\nIf you are building a coding agent, or a platform whose whole purpose is safe arbitrary code execution, a VM-shaped product can be exactly right.\n\nBut for always-on data agents and business agents, treating a VM as the default unit of identity is the wrong starting point.\n\nAgents need durable state. They need scoped tools. They need memory. They need scheduling. They need files.\n\nSometimes they need a sandbox.\n\nThose are not the same thing.\n\nThat is the bet we made with gluonDB: keep the agent alive without keeping its machine alive.\n\nSplit orchestration, filesystem, and execution. Keep the flexibility of code execution, but scale the common path like a web application. Let sandboxes be borrowed, not owned.\n\nThat is how we serve a large agent fleet on a small infra footprint.", "url": "https://wpnews.pro/news/how-we-securely-serve-a-large-agent-fleet-on-a-small-infra-footprint", "canonical_source": "https://gluondb.com/blog/how-we-securely-serve-a-large-agent-fleet", "published_at": "2026-06-24 22:05:58+00:00", "updated_at": "2026-06-24 22:13:28.666040+00:00", "lang": "en", "topics": ["ai-agents", "ai-infrastructure", "ai-tools"], "entities": ["gluonDB", "LangGraph", "Vercel AI SDK", "OpenAI Agents SDK", "Firecracker", "Fly Machines", "E2B"], "alternates": {"html": "https://wpnews.pro/news/how-we-securely-serve-a-large-agent-fleet-on-a-small-infra-footprint", "markdown": "https://wpnews.pro/news/how-we-securely-serve-a-large-agent-fleet-on-a-small-infra-footprint.md", "text": "https://wpnews.pro/news/how-we-securely-serve-a-large-agent-fleet-on-a-small-infra-footprint.txt", "jsonld": "https://wpnews.pro/news/how-we-securely-serve-a-large-agent-fleet-on-a-small-infra-footprint.jsonld"}}