# Agents in the Sandbox

> Source: <https://blog.railway.com/p/agents-in-the-sandbox>
> Published: 2026-06-24 13:00:00+00:00

# Agents in the Sandbox

## Agents in the Sandbox

[Railway sandboxes](https://docs.railway.com/sandboxes) now include [Claude Code](https://docs.anthropic.com/en/docs/claude-code/overview), [Codex](https://developers.openai.com/codex/cli/), [OpenCode](https://opencode.ai/docs/), and [Pi](https://pi.dev/) in their default image. If you haven't enabled them yet, Sandboxes are available through [Priority Boarding](https://docs.railway.com/sandboxes). It turns out, wild as it may seem, people * really* like using sandboxes for agent workloads.

You should not have to spend the first few minutes of every sandbox session figuring out how to get the same agent harness installed that you ran in slightly different ways 20 times before. Unnecessary friction is... unnecessary. We want to make it easy to quickly stand up agents that you can use right alongside the applications and infrastructure you are already using in Railway. Fire up the agent, get your configurations in place, checkpoint, and loop.

For now, you still have to either pass iny our agents configuration information, or API's key's using something like [ --variable](https://docs.railway.com/cli/sandbox) and

[Global Variables](https://docs.railway.com/variables); but we've got plans. Keep an eye on this space!

## More than a shell

The most obvious angle with the sandbox story we all tell is "give the agent a computer." It's useful and translates well, but we're a lot more interested in how we can give your sandboxes and their agents the entirety of Railway's infrastructure at their disposal. What if these sandboxes could immediately spin up the databases they need? Or functions to help test their functionality against?

A sandbox on Railway shouldn't just be a shell floating off to the side. Because it lives alongside the rest of your Railway environment, it can join the same network and talk to the services alongside it. Pass [ --private-network](https://docs.railway.com/sandboxes#networking) when you create a sandbox and it ends up on your environment's

[private network](https://docs.railway.com/sandboxes#networking).

This means that not only can sandboxes do the typical "run untrusted code" things, but they can also test code against the same types of infrastructure that they will eventually use: [Postgres](https://docs.railway.com/databases), [Redis](https://docs.railway.com/databases), internal services, [variables](https://docs.railway.com/variables), whatever is in the project.

## How do you hold the Sandbox?

The loop we most commonly see users adopt looks something like:

- Create a reusable base template for the work (
)`railway sandbox template build`

- Start a sandbox from that base (
)`railway sandbox create --template`

- Configure it for the task: sign into agents, write guidance files, install packages, seed variables
- Checkpoint the configured state (
)`railway sandbox checkpoint create`

- Create sandboxes from that checkpoint, or fork the current sandbox as the work branches (
/`railway sandbox create --checkpoint`

)`railway sandbox fork --variable`

We [shipped checkpoints and port forwarding](https://railway.com/changelog/2026-06-12-docker-in-sandboxes) in the CLI recently. Checkpoints let you snapshot the current state of the storage for the sandbox, and `forward`

lets you connect to services running within the sandbox from your local system.

The end state of this workflow gives you a working configuration that can be forked into as many sandboxes as you need, very fast. Each new run is just another create, and you can even connect to the services running within them if needed.

## The SDK loop

So far, all the examples we've given have been around the CLI, but if you're integrating sandboxes into your application - the [TypeScript SDK](https://docs.railway.com/sandboxes#typescript-sdk) is the primary programmatic interface. The SDK is [open source on GitHub](https://github.com/railwayapp/railway-ts-sdk). The quick version looks like this:

``` js
import { Sandbox } from "railway";

const sandbox = await Sandbox.create();

const result = await sandbox.exec("git --version");
console.log(result.stdout);

await sandbox.destroy();
```

`Sandbox.create()`

gives you a running sandbox, ready to `exec`

against.

Where it gets more interesting is when you stop treating each sandbox like a one-off machine and start treating it like a branchable work environment.

For example, [Repo Review Agent](https://github.com/codyde/repo-review-agent) is an app where a user pastes in a GitHub repo, and the agents check over different application configurations. The app first prepares the repo once by cloning it, installs dependencies, checkpoints it, then creates new sandboxes for each agent runs: one for tests, one for architecture, one for security, one for product polish.

The useful loop is: boot a sandbox from a known base, clone the repo, install dependencies, run a verification step, checkpoint the prepared workspace, then create one sandbox per agent from that checkpoint. When the run is over, collect the results and destroy the temporary sandboxes.

``` js
import { Sandbox } from "railway";

// git, Node, npm, and the four agents already ship in the default image,
// so there is nothing to install up front — just create and get to work.
const sandbox = await Sandbox.create({
  idleTimeoutMinutes: 30,
  env: {
    RAILWAY_API_TOKEN: process.env.RAILWAY_API_TOKEN!,
    DATABASE_URL: "${{Postgres.DATABASE_URL}}",
  },
  networkIsolation: "PRIVATE",
});

await sandbox.exec(
  "bash -lc 'git clone https://github.com/codyde/repo-review-agent /root/workspace'",
);

await sandbox.files.write(
  "/root/workspace/AGENTS.md",
  [
    "# sandbox guidance",
    "Work in /root/workspace.",
    "Write important results to files before the turn ends.",
  ].join("\n"),
);

const install = await sandbox.exec("bash -lc 'npm ci'", {
  cwd: "/root/workspace",
});

if (install.exitCode !== 0) {
  console.error(install.stderr);
}

const check = await sandbox.exec("bash -lc 'npm run build'", {
  cwd: "/root/workspace",
  timeoutSec: 120,
});

if (check.exitCode !== 0) {
  console.error(check.stderr);
}

await sandbox.checkpoint("repo-review-agent");
```

The [ ${{Postgres.DATABASE_URL}}](https://docs.railway.com/variables#reference-syntax) syntax resolves Railway

[variable references](https://docs.railway.com/variables#reference-syntax)when the sandbox is created.

[joins the sandbox to your environment's private network. The](https://docs.railway.com/sandboxes#networking)

`networkIsolation: "PRIVATE"`

[call writes an](https://docs.railway.com/sandboxes#files)

`sandbox.files.write`

[guidance file directly to the workspace.](https://docs.railway.com/agents)

`AGENTS.md`

The default image carries the toolchain and the agents, so the live sandbox just clones, installs, and verifies. The checkpoint captures that prepared state — every later run boots straight into it instead of repeating the setup.

Sandboxes use [mise](https://mise.jdx.dev/) for the default toolchain, so for non-interactive `exec`

calls I like to run through `bash -lc`

. That gives you the same configured environment the sandbox image expects.

Later, from another process, another machine, or another step in your workflow:

``` js
import { Sandbox } from "railway";

const sandbox = await Sandbox.create("repo-review-agent", {
  env: {
    ANTHROPIC_API_KEY: process.env.ANTHROPIC_API_KEY!,
  },
});

const agent = sandbox.exec(
  'bash -lc \'claude -p "review the architecture of src/lib/review.server.ts and propose improvements"\'',
  {
    cwd: "/root/workspace",
    onStdout: (chunk) => process.stdout.write(chunk),
    onStderr: (chunk) => process.stdout.write(chunk),
  },
);

const sessionName = await agent.sessionName;
await agent.detach();

console.log(`Agent is running in session ${sessionName}`);
```

[Long-running commands](https://docs.railway.com/sandboxes#long-running-commands) keep running in the sandbox even if your client disconnects. You get a durable session name back, and can reattach later:

``` js
const sandbox = await Sandbox.connect(process.env.SANDBOX_ID!);

await sandbox.exec(
  { sessionName: process.env.AGENT_SESSION! },
  {
    resumeFromLastRead: true,
    onStdout: (chunk) => process.stdout.write(chunk),
  },
);
```

For agent workflows, this is one of the most useful features. You are not forced to keep one local process alive forever just to keep the work going. The sandbox is the place where the work is happening. If you have a checkpoint, getting back into that state is one call away.

## Fork when the work branches

[Checkpoints](https://docs.railway.com/sandboxes#checkpoints) are great for reusable bases. [Forks](https://docs.railway.com/sandboxes#forking) are great when the work branches from something already running.

Using the example of our Repo Review Agent, we take a fresh checkpoint after our application is cloned, dependencies are installed, and everything is configured. This lets us fire off the 4 subagents that do the different application review flows.

Install dependencies once. Clone the filesystem. Try two approaches independently.

``` js
const base = await Sandbox.create("repo-review-agent");

const bugReview = await base.fork();
const securityReview = await base.fork();

await Promise.all([
  bugReview.exec(
    "bash -lc \"codex exec 'find likely bugs in src/lib/review.server.ts, then run npm run build'\"",
    { cwd: "/root/workspace" },
  ),
  securityReview.exec(
    "bash -lc \"opencode run 'audit the sandbox exec calls for command injection'\"",
    { cwd: "/root/workspace" },
  ),
]);
```

In our example, we create new from the checkpoint, but forks are an option here too. The forks boot fresh from the same disk state of the running system that we took our checkpoint on originally. Files are preserved. Running processes are not. That is usually exactly what you want: copy the workbench, not the half-running experiment. Pick up where we left off.

## The CLI loop

For a lot of work, you do not need to write code around the sandbox at all. The [Railway CLI](https://docs.railway.com/cli) gives you the same basic shape from your terminal. See the [ railway sandbox](https://docs.railway.com/cli/sandbox) reference for every subcommand.

The default image already ships git, Node, npm, and the four agents, so for an app like this there is nothing to pre-install — create a sandbox directly. (Reach for [ railway sandbox template build](https://docs.railway.com/cli/sandbox#template) when you need something the image does not include.)

```
railway sandbox create --private-network
```

Run the setup:

```
railway sandbox exec -- bash -lc 'git clone https://github.com/codyde/repo-review-agent /root/workspace'
railway sandbox exec -- bash -lc 'cd /root/workspace && npm ci'
```

Checkpoint the good state:

```
railway sandbox checkpoint create repo-review-agent
```

Spin up another sandbox from it:

```
railway sandbox create --checkpoint repo-review-agent
```

Now run the agent:

```
railway sandbox exec -- bash -lc 'codex exec "review the repo for likely bugs and explain what you would change"'
```

Or forward a dev server back to your machine with [ railway sandbox forward](https://docs.railway.com/cli/sandbox#forward):

```
railway sandbox exec --detach -- bash -lc 'cd /root/workspace && npm run dev'
railway sandbox forward 8080
```

The sandbox sets `PORT=8080`

, and the app binds it, so forward `8080`

to reach the dev server on `localhost:8080`

.

The CLI keeps an active sandbox for the current session, so most of the time you are not carrying IDs around. Create it, exec into it, checkpoint it, fork it, destroy it.

```
railway sandbox fork
railway sandbox exec -- bash -lc 'cd /root/workspace && npm run build'
railway sandbox destroy
```

Less ceremony. More loops.

## Templates vs checkpoints vs forks

They look similar, but the differences are worth talking through.

[Templates](https://docs.railway.com/sandboxes#templates) are built from ordered shell instructions. Use them for repeatable bases: install common packages, set up language tooling, prewarm a known environment. Railway content-addresses and caches them, so rebuilding the same template is cheap.

[Checkpoints](https://docs.railway.com/sandboxes#checkpoints) capture the disk of a running sandbox into a named snapshot stored server-side in the environment. Use them after expensive live setup: cloned repo, installed deps, generated assets, migrated fixtures. You can destroy the original sandbox and still create from the checkpoint later.

[Forks](https://docs.railway.com/sandboxes#forking) clone a running sandbox into another running sandbox. Use them when the work branches right now: compare two fixes, run parallel agent attempts, split one reproduction into multiple experiments.

The combination of these gives agents a much better loop:

``` php
template -> sandbox -> configure -> checkpoint -> create/fork -> verify -> destroy
```

The agent does not need to rebuild the world every time. It can carry forward the useful state and throw away the rest. This loop becomes even more powerful when you look at how you can parallelize the workloads. Our earlier example showed the 4 agent moving through the review processes.

We're able to capture the results, have the agents return them, and destroy themselves after. If we needed to run the agents again, it would be as simple as booting up the checkpoint again and moving forward.

## Why bundle the agents?

Setup friction compounds fast. We're here to beat that back.

If you create one sandbox, installing your preferred harness by hand is annoying. If you create a lot of them, or fork across multiple attempts, it becomes the thing that slows down the loop for no good reason.

[Claude Code](https://docs.anthropic.com/en/docs/claude-code/overview), [Codex](https://developers.openai.com/codex/cli/), [OpenCode](https://opencode.ai/docs/), and [Pi](https://pi.dev/) being available by default means the sandbox is much closer to the work at creation time. Start the environment, run the agent, inspect the result, checkpoint what matters, fork when the work needs to branch.

These four give a wide range of model availability and high developer experience. Pair them with [Railway Agent Skills](https://docs.railway.com/ai/agent-skills) on your local harness so the agent knows how to drive sandboxes, checkpoints, and forks from your editor too.

This is the direction we want Railway sandboxes to keep moving: less bootstrap, more useful work. Faster loops, happening right next to all the infrastructure you already have.

Agents are better when they have a computer, but agents are incredible when they have access to [all your infrastructure](https://docs.railway.com/agents).

## Further reading

[Sandboxes documentation](https://docs.railway.com/sandboxes)— concepts, SDK reference, networking, and limits— templates, checkpoints, forks, exec, and port forwarding`railway sandbox`

CLI reference[Railway TypeScript SDK](https://github.com/railwayapp/railway-ts-sdk)— open source; scaffold with`bun create railway@latest`

[Railway for Agents](https://docs.railway.com/agents)— the broader agent setup, skills, and MCP story[Sandboxes changelog](https://railway.com/changelog/2026-06-05-sandboxes)— original launch and dashboard workflow

Happy shipping.

*-- Cody*
