Production teams are running agent control planes now. But many hit a wall: they looked at per-pod sandbox requirements and decided the infrastructure overhead wasn't worth it.
Last month, LiteLLM Agent Platform shipped the inline harness, and it changes that math entirely. This isn't a small feature—it's the difference between "we'll try this later" and "we can run this in production today."
Let me back up. When you run a coding agent (Claude Code, OpenCode, Cursor), you need: The standard approach: one pod per agent or per-team. Clean isolation. Obvious deployment model. But for teams with 5–20 agents across an engineering org, that's 5–20 pods. Each pod carries:
Many teams looked at this and said: "Great control plane, but we're not running 20 pods for agents. We'll stick with direct Anthropic console access."
The inline harness is a shared, inline opencode harness that ships as a first-class option in the harness picker—no per-agent pod required. Skills, MCP tools, system prompts, and memory all carry over.
This means:
You get the control plane benefits (session persistence, budget enforcement, audit trails, team access, credential vault) without the per-pod cost.
The inline harness is the inflection point where production teams move from "we'll manage agents in the console" to "we'll run them on our control plane."
For teams with 5–15 agents:
You can now run all of them on a single OpenCode harness, shared across the team. Infrastructure cost drops from "5 pods × baseline overhead × regions" to "one shared harness." Agents still have session isolation, memory, scheduled execution, and full LiteLLM governance.
For teams starting with agents:
You're not forced to choose between "lightweight (no control plane) or heavy (per-pod infrastructure)." You start with the inline harness, get instant control plane benefits, and upgrade to per-pod if you hit the scaling ceiling.
For teams evaluating LiteLLM Agent Platform:
The objection "per-pod is too heavy for our org" is now off the table. You can deploy a lightweight, shared harness inside 24 hours and gain visibility into all your agents immediately.
Let's be concrete. Suppose you're an engineering team with three coding agents:
Without inline harness (old model): With inline harness:
The inline harness is not a replacement for per-pod isolation when you need it. If you have:
You still have the per-pod option. The inline harness is the pragmatic default for teams with standard isolation needs.
This is how production infrastructure matures:
LiteLLM Agent Platform is at step two. The inline harness removes the infrastructure objection, leaving only "is this the right control plane for my team?"
If you're evaluating agent control planes, test the inline harness first. Spend two days deploying it on your infrastructure. See what it means to have one place to create, run, and observe agents. Then decide if the per-pod model matters for your workload. Because for most teams, it won't.
The inline harness is a small feature that unlocks a big change: production teams can now run agent control planes at scale without infrastructure complexity.
Paul Twist — European AI engineer & technical writer. I turn messy AI infrastructure into practical guides developers can actually use. Berlin-based, focused on production agent systems and open infrastructure.
Tag your agent control plane evaluation in the comments — what's holding you back from running agents on your infrastructure?