{"slug": "why-coding-agents-need-two-halves-of-infrastructure-control-plane-fast-data", "title": "Why Coding Agents Need Two Halves of Infrastructure: Control Plane + Fast Data Plane", "summary": "A developer argues that coding agents like Claude Code and Codex require two distinct infrastructure layers: a control plane for agent lifecycle, permissions, and scheduling, and a fast data plane for sub-millisecond LLM request routing. The post highlights that current solutions often conflate these needs, leading to operational friction and latency overhead that compounds across multiple LLM calls per task.", "body_md": "You've probably seen the benchmarks by now. Bifrost does 11 microseconds. LiteLLM Python does 40-50ms. The messaging is simple: **latency matters for gateways**. But this misses what teams actually building with Claude Code and Codex have discovered: **the real problem isn't gateway latency alone. It's that coding agents need two completely different infrastructure layers, and teams are treating them like one.**\n\nWhen you deploy Claude Code or Codex into a team, you're actually solving two separate problems:\n\n**What it does:** Manage agent lifecycle, sessions, memory, permissions, scheduling, tool access, and audit trails.\n\n**What teams need here:**\n\n**Why a gateway can't do this:** A gateway sees request-response pairs. It doesn't know that Agent-A is trying to access Customer Database and needs to be blocked, or that you want to run a code review agent every night at 2 AM. These are control decisions that live above the request layer.\n\n**What it does:** Route LLM requests to the right provider, handle fallbacks, track costs, log traffic, enforce budgets.\n\n**Why it matters for coding agents:** Each code edit, test run, or tool invocation is an LLM call. Claude Code can make 30-50 calls per task. If your gateway adds 1ms per call, that's 30-50ms of compounded overhead. With 40-50ms per Python gateway call, you're looking at 1.2-2.5 seconds of pure gateway latency on a single 30-call task. That's the latency you actually *feel*.\n\nMost gateway discussions treat these as one problem. **A fast gateway that routes LLM requests**. That's necessary, but not sufficient.\n\nHere's what I'm seeing in production teams running Claude Code:\n\nYou can't do #1-4 with a pure gateway. You can't efficiently do #5-8 with a control platform that doesn't understand routing.\n\nTeams are currently solving this by bolting together:\n\nThat works, but it creates operational friction.\n\nThe teams I've talked to who are scaling Claude Code and Codex beyond a single developer describe it like this:\n\n\"Claude Code is incredible when it's one engineer using it locally. The moment we try to run it on a team, we need:\n\n- A place to define 'these are our agents' (not 30 copies in 30 notebooks)\n- A way to say 'this agent runs on a schedule'\n- Control over which tools each agent can access\n- Visibility into what each agent is doing\n- The ability to route to Claude or Gemini based on task type without editing the agent\n- Sub-millisecond gateway latency so 30 LLM calls don't turn into 1.5 seconds of overhead\"\n\nThat's **control plane** + **data plane** thinking. Two distinct layers.\n\n**Example:** You define an agent in the platform UI, attach it to Claude Code, give it GitHub + AWS MCPs, schedule it to run every night on your codebase's failing tests, and it automatically creates PRs with fixes. All without anyone touching provider consoles.\n\n**Example:** Your coding agent makes 40 calls. Gateway adds <1ms per call (40 microseconds total). Control plane tracks that this agent used Claude on 25 calls, Gemini on 15, cost $0.23. If Claude hits rate limits, gateway transparently retries on Gemini without the agent knowing.\n\n**Claude Code:** Now supports [Hooks](https://docs.anthropic.com/en/docs/agents/hooks) for policy enforcement at key lifecycle events (TaskStarted, ToolCall, TaskCompleted). That's control-plane-level thinking inside the agent harness.\n\n**Codex:** Added [Managed Agents API](https://openai.com/blog/api-updates-preview) for creating/running agents from your own infrastructure. That's recognizing the control plane problem.\n\n**LiteLLM-Rust:** Launched June 2026 specifically for \"coding agent workloads\" with <1ms target on Claude Code calls, integrated sandbox support (E2B, Daytona), and durable sessions on the roadmap. That's explicitly targeting the data plane + agent runtime integration.\n\n**TrueFoundry, Kong, Portkey:** All shipping \"agent gateway\" features that blur the line — they're trying to build control + data in one platform.\n\nThe market is recognizing that governance + routing are different concerns, even if some platforms try to unify them.\n\n**If you're running Claude Code on a single developer or small team:**\n\n**If you're scaling Claude Code or Codex across a team:**\n\n**If you're mixing multiple harnesses (Claude Code + Codex + OpenCode):**\n\n**If you need compliance/audit/data residency:**\n\nWhen evaluating control + data infrastructure for coding agents:\n\n**Control Plane:**\n\n**Data Plane:**\n\n**Integration:**\n\nThe sexiest discussion is always about latency. But in teams I talk to running Claude Code at scale, the conversation goes:\n\n\"Okay, gateway overhead is solved. Now: how do we keep prod from running experiments? How do we keep Agent-A from accessing the customer database? How do I know what happened yesterday when Agent-B deleted something? Can I run this every night? Can I give the junior engineer the ability to create agents without giving them API access?\"\n\nThat's all control plane work. And it's unglamorous, but it's what stops you from sleeping at 3 AM.\n\nCoding agents (Claude Code, Codex, OpenCode) need **two halves**:\n\nA fast gateway solves problem #2. A control platform solves problem #1. **Both are table stakes for production.**\n\nThe platforms that will win here are the ones that make the separation clear and let teams pick the right tool for each job — or that do both well without unnecessary coupling.\n\n**What's your experience been?** Are you running coding agents on a team? What's the first thing that broke when you tried to scale from one developer to five?\n\n*Paul Twist is an AI infrastructure engineer based in Berlin, focused on production agent systems and multi-provider LLM routing.*", "url": "https://wpnews.pro/news/why-coding-agents-need-two-halves-of-infrastructure-control-plane-fast-data", "canonical_source": "https://dev.to/paultwist/why-coding-agents-need-two-halves-of-infrastructure-control-plane-fast-data-plane-2eda", "published_at": "2026-06-17 16:01:30+00:00", "updated_at": "2026-06-17 16:22:02.309863+00:00", "lang": "en", "topics": ["ai-agents", "large-language-models", "developer-tools", "ai-infrastructure", "ai-products"], "entities": ["Claude Code", "Codex", "LiteLLM", "Bifrost", "Anthropic", "OpenAI", "TrueFoundry", "Kong"], "alternates": {"html": "https://wpnews.pro/news/why-coding-agents-need-two-halves-of-infrastructure-control-plane-fast-data", "markdown": "https://wpnews.pro/news/why-coding-agents-need-two-halves-of-infrastructure-control-plane-fast-data.md", "text": "https://wpnews.pro/news/why-coding-agents-need-two-halves-of-infrastructure-control-plane-fast-data.txt", "jsonld": "https://wpnews.pro/news/why-coding-agents-need-two-halves-of-infrastructure-control-plane-fast-data.jsonld"}}