cd /news/ai-agents/why-coding-agents-need-two-halves-of… · home topics ai-agents article
[ARTICLE · art-31370] src=dev.to ↗ pub= topic=ai-agents verified=true sentiment=· neutral

Why Coding Agents Need Two Halves of Infrastructure: Control Plane + Fast Data Plane

A developer argues that coding agents like Claude Code and Codex require two distinct infrastructure layers: a control plane for agent lifecycle, permissions, and scheduling, and a fast data plane for sub-millisecond LLM request routing. The post highlights that current solutions often conflate these needs, leading to operational friction and latency overhead that compounds across multiple LLM calls per task.

read4 min views1 publishedJun 17, 2026

You've probably seen the benchmarks by now. Bifrost does 11 microseconds. LiteLLM Python does 40-50ms. The messaging is simple: latency matters for gateways. But this misses what teams actually building with Claude Code and Codex have discovered: the real problem isn't gateway latency alone. It's that coding agents need two completely different infrastructure layers, and teams are treating them like one.

When you deploy Claude Code or Codex into a team, you're actually solving two separate problems:

What it does: Manage agent lifecycle, sessions, memory, permissions, scheduling, tool access, and audit trails.

What teams need here:

Why a gateway can't do this: A gateway sees request-response pairs. It doesn't know that Agent-A is trying to access Customer Database and needs to be blocked, or that you want to run a code review agent every night at 2 AM. These are control decisions that live above the request layer.

What it does: Route LLM requests to the right provider, handle fallbacks, track costs, log traffic, enforce budgets.

Why it matters for coding agents: Each code edit, test run, or tool invocation is an LLM call. Claude Code can make 30-50 calls per task. If your gateway adds 1ms per call, that's 30-50ms of compounded overhead. With 40-50ms per Python gateway call, you're looking at 1.2-2.5 seconds of pure gateway latency on a single 30-call task. That's the latency you actually feel.

Most gateway discussions treat these as one problem. A fast gateway that routes LLM requests. That's necessary, but not sufficient.

Here's what I'm seeing in production teams running Claude Code:

You can't do #1-4 with a pure gateway. You can't efficiently do #5-8 with a control platform that doesn't understand routing.

Teams are currently solving this by bolting together:

That works, but it creates operational friction.

The teams I've talked to who are scaling Claude Code and Codex beyond a single developer describe it like this:

"Claude Code is incredible when it's one engineer using it locally. The moment we try to run it on a team, we need:

  • A place to define 'these are our agents' (not 30 copies in 30 notebooks)
  • A way to say 'this agent runs on a schedule'
  • Control over which tools each agent can access
  • Visibility into what each agent is doing
  • The ability to route to Claude or Gemini based on task type without editing the agent
  • Sub-millisecond gateway latency so 30 LLM calls don't turn into 1.5 seconds of overhead"

That's control plane + data plane thinking. Two distinct layers.

Example: You define an agent in the platform UI, attach it to Claude Code, give it GitHub + AWS MCPs, schedule it to run every night on your codebase's failing tests, and it automatically creates PRs with fixes. All without anyone touching provider consoles.

Example: Your coding agent makes 40 calls. Gateway adds <1ms per call (40 microseconds total). Control plane tracks that this agent used Claude on 25 calls, Gemini on 15, cost $0.23. If Claude hits rate limits, gateway transparently retries on Gemini without the agent knowing.

Claude Code: Now supports Hooks for policy enforcement at key lifecycle events (TaskStarted, ToolCall, TaskCompleted). That's control-plane-level thinking inside the agent harness.

Codex: Added Managed Agents API for creating/running agents from your own infrastructure. That's recognizing the control plane problem.

LiteLLM-Rust: Launched June 2026 specifically for "coding agent workloads" with <1ms target on Claude Code calls, integrated sandbox support (E2B, Daytona), and durable sessions on the roadmap. That's explicitly targeting the data plane + agent runtime integration.

TrueFoundry, Kong, Portkey: All shipping "agent gateway" features that blur the line — they're trying to build control + data in one platform.

The market is recognizing that governance + routing are different concerns, even if some platforms try to unify them.

If you're running Claude Code on a single developer or small team:

If you're scaling Claude Code or Codex across a team:

If you're mixing multiple harnesses (Claude Code + Codex + OpenCode):

If you need compliance/audit/data residency:

When evaluating control + data infrastructure for coding agents:

Control Plane:

Data Plane:

Integration:

The sexiest discussion is always about latency. But in teams I talk to running Claude Code at scale, the conversation goes:

"Okay, gateway overhead is solved. Now: how do we keep prod from running experiments? How do we keep Agent-A from accessing the customer database? How do I know what happened yesterday when Agent-B deleted something? Can I run this every night? Can I give the junior engineer the ability to create agents without giving them API access?"

That's all control plane work. And it's unglamorous, but it's what stops you from sleeping at 3 AM.

Coding agents (Claude Code, Codex, OpenCode) need two halves:

A fast gateway solves problem #2. A control platform solves problem #1. Both are table stakes for production.

The platforms that will win here are the ones that make the separation clear and let teams pick the right tool for each job — or that do both well without unnecessary coupling.

What's your experience been? Are you running coding agents on a team? What's the first thing that broke when you tried to scale from one developer to five?

Paul Twist is an AI infrastructure engineer based in Berlin, focused on production agent systems and multi-provider LLM routing.

── more in #ai-agents 4 stories · sorted by recency
── more on @claude code 3 stories trending now
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/why-coding-agents-ne…] indexed:0 read:4min 2026-06-17 ·