Building An AI Agent Playground Before Giving It Production Access A developer outlines a method for building an AI agent playground that intercepts tool calls before they reach production systems, allowing agents to run their full decision loop against mocked APIs. The approach sandboxes the executor rather than the model, enabling safe testing of multi-turn agent behavior and preventing costly mistakes like deleting production data. A coding agent runs cleanup old records against what it thinks is a staging database. It isn't. The connection string came from an environment variable that got overwritten three deploys ago, and the agent just deleted four months of customer orders. It did exactly what it was told. It just had its hands on the wrong system. That failure isn't an argument against agents. It's an argument against the thing almost everyone skips: a place for the agent to be wrong cheaply . You wouldn't hand a new engineer production database credentials on their first morning and walk away. You'd give them a staging environment, a read-only replica, a code review gate, and a few weeks of supervised work. An agent deserves exactly the same onboarding, except it can take a thousand actions a minute, so the cost of skipping the playground is a thousand times higher. This is about how to build that playground. Not a vibes-based "we tested it a bit" demo, but a real staging ground where the agent runs its full loop against fakes, where you can make tools fail on purpose, where you replay the same task until you trust the consistency, and where production access is something the agent earns rather than gets by default. Let's be precise, because "sandbox" gets used for three different things and people talk past each other. An agent playground is an environment where the agent executes its complete decision loop read context, reason, pick a tool, call it, read the result, decide again , but every side effect is intercepted before it reaches a real system. The model still thinks it's talking to your payments API. It still gets back a plausible response. The difference is that nothing it does leaves the box. That last part matters more than it sounds. A lot of "testing" for agents is really just testing the prompt : you ask the model a question, you read its answer, you nod. But an agent's behavior isn't its first reply. It's the sequence of tool calls it makes over a dozen turns when the world pushes back. The interesting failures live in turn seven, after a tool returned something the agent didn't expect. You can't surface those by eyeballing a single response. You need the loop running end to end. So the playground has to do three jobs at once: let the loop run for real, stop the side effects from being real, and record everything so you can inspect what happened. Get those three right and you've got somewhere the agent can fail loudly without filing an incident report. Here's the mechanism that makes all of this work, and it's worth understanding a layer deeper than "we mock the API." An agent loop is mechanically simple. The model emits a structured tool call: a name and some arguments. Your harness extracts that call, executes it against the real world, takes the result, appends it to the transcript, and feeds the whole thing back to the model. The loop continues until the model stops calling tools or hits a terminal state. That "execute it against the real world" step is the only place a side effect can happen. Everything else is just text moving around. Which means you don't need to sandbox the model. You need to sandbox the executor . Put a seam right where tool calls turn into actions, and you control the entire blast radius from one place. agent/executor.ts // The whole loop touches the real world in exactly one spot. type ToolCall = { name: string; args: Record