# Why Coding Agents Need Two Halves of Infrastructure: Control Plane + Fast Data Plane

> Source: <https://dev.to/paultwist/why-coding-agents-need-two-halves-of-infrastructure-control-plane-fast-data-plane-2eda>
> Published: 2026-06-17 16:01:30+00:00

You've probably seen the benchmarks by now. Bifrost does 11 microseconds. LiteLLM Python does 40-50ms. The messaging is simple: **latency matters for gateways**. But this misses what teams actually building with Claude Code and Codex have discovered: **the real problem isn't gateway latency alone. It's that coding agents need two completely different infrastructure layers, and teams are treating them like one.**

When you deploy Claude Code or Codex into a team, you're actually solving two separate problems:

**What it does:** Manage agent lifecycle, sessions, memory, permissions, scheduling, tool access, and audit trails.

**What teams need here:**

**Why a gateway can't do this:** A gateway sees request-response pairs. It doesn't know that Agent-A is trying to access Customer Database and needs to be blocked, or that you want to run a code review agent every night at 2 AM. These are control decisions that live above the request layer.

**What it does:** Route LLM requests to the right provider, handle fallbacks, track costs, log traffic, enforce budgets.

**Why it matters for coding agents:** Each code edit, test run, or tool invocation is an LLM call. Claude Code can make 30-50 calls per task. If your gateway adds 1ms per call, that's 30-50ms of compounded overhead. With 40-50ms per Python gateway call, you're looking at 1.2-2.5 seconds of pure gateway latency on a single 30-call task. That's the latency you actually *feel*.

Most gateway discussions treat these as one problem. **A fast gateway that routes LLM requests**. That's necessary, but not sufficient.

Here's what I'm seeing in production teams running Claude Code:

You can't do #1-4 with a pure gateway. You can't efficiently do #5-8 with a control platform that doesn't understand routing.

Teams are currently solving this by bolting together:

That works, but it creates operational friction.

The teams I've talked to who are scaling Claude Code and Codex beyond a single developer describe it like this:

"Claude Code is incredible when it's one engineer using it locally. The moment we try to run it on a team, we need:

- A place to define 'these are our agents' (not 30 copies in 30 notebooks)
- A way to say 'this agent runs on a schedule'
- Control over which tools each agent can access
- Visibility into what each agent is doing
- The ability to route to Claude or Gemini based on task type without editing the agent
- Sub-millisecond gateway latency so 30 LLM calls don't turn into 1.5 seconds of overhead"

That's **control plane** + **data plane** thinking. Two distinct layers.

**Example:** You define an agent in the platform UI, attach it to Claude Code, give it GitHub + AWS MCPs, schedule it to run every night on your codebase's failing tests, and it automatically creates PRs with fixes. All without anyone touching provider consoles.

**Example:** Your coding agent makes 40 calls. Gateway adds <1ms per call (40 microseconds total). Control plane tracks that this agent used Claude on 25 calls, Gemini on 15, cost $0.23. If Claude hits rate limits, gateway transparently retries on Gemini without the agent knowing.

**Claude Code:** Now supports [Hooks](https://docs.anthropic.com/en/docs/agents/hooks) for policy enforcement at key lifecycle events (TaskStarted, ToolCall, TaskCompleted). That's control-plane-level thinking inside the agent harness.

**Codex:** Added [Managed Agents API](https://openai.com/blog/api-updates-preview) for creating/running agents from your own infrastructure. That's recognizing the control plane problem.

**LiteLLM-Rust:** Launched June 2026 specifically for "coding agent workloads" with <1ms target on Claude Code calls, integrated sandbox support (E2B, Daytona), and durable sessions on the roadmap. That's explicitly targeting the data plane + agent runtime integration.

**TrueFoundry, Kong, Portkey:** All shipping "agent gateway" features that blur the line — they're trying to build control + data in one platform.

The market is recognizing that governance + routing are different concerns, even if some platforms try to unify them.

**If you're running Claude Code on a single developer or small team:**

**If you're scaling Claude Code or Codex across a team:**

**If you're mixing multiple harnesses (Claude Code + Codex + OpenCode):**

**If you need compliance/audit/data residency:**

When evaluating control + data infrastructure for coding agents:

**Control Plane:**

**Data Plane:**

**Integration:**

The sexiest discussion is always about latency. But in teams I talk to running Claude Code at scale, the conversation goes:

"Okay, gateway overhead is solved. Now: how do we keep prod from running experiments? How do we keep Agent-A from accessing the customer database? How do I know what happened yesterday when Agent-B deleted something? Can I run this every night? Can I give the junior engineer the ability to create agents without giving them API access?"

That's all control plane work. And it's unglamorous, but it's what stops you from sleeping at 3 AM.

Coding agents (Claude Code, Codex, OpenCode) need **two halves**:

A fast gateway solves problem #2. A control platform solves problem #1. **Both are table stakes for production.**

The platforms that will win here are the ones that make the separation clear and let teams pick the right tool for each job — or that do both well without unnecessary coupling.

**What's your experience been?** Are you running coding agents on a team? What's the first thing that broke when you tried to scale from one developer to five?

*Paul Twist is an AI infrastructure engineer based in Berlin, focused on production agent systems and multi-provider LLM routing.*
