The AI engineering stack we built internally — on the platform we ship In the last 30 days, 93% of Cloudflare's R&D organization used AI coding tools built on the company's own platform, with 3,683 internal users making 47.95 million AI requests. The engineering stack, developed by a tiger team called iMARS and now managed by the Dev Productivity team, integrates AI Gateway, Workers AI, and other shipping products to handle authentication, routing, inference, and code quality enforcement. This internal infrastructure has significantly increased developer velocity, with the 4-week rolling average of merge requests climbing from ~5,600 to over 8,700 per week. In the last 30 days, 93% of Cloudflare’s R&D organization used AI coding tools powered by infrastructure we built on our own platform. Eleven months ago, we undertook a major project: to truly integrate AI into our engineering stack. We needed to build the internal MCP servers, access layer, and AI tooling necessary for agents to be useful at Cloudflare. We pulled together engineers from across the company to form a tiger team called iMARS Internal MCP Agent/Server Rollout Squad . The sustained work landed with the Dev Productivity team, who also own much of our internal tooling including CI/CD, build systems, and automation. Here are some numbers that capture our own agentic AI use over the last 30 days: 3,683 internal users actively using AI coding tools 60% company-wide, 93% across R&D , out of approximately 6,100 total employees 47.95 million AI requests 295 teams are currently utilizing agentic AI tools and coding assistants. 20.18 million AI Gateway requests per month 241.37 billion tokens routed through AI Gateway 51.83 billion tokens processed on Workers AI The impact on developer velocity internally is clear: we’ve never seen a quarter-to-quarter increase in merge requests to this degree. As AI tooling adoption has grown the 4-week rolling average has climbed from ~5,600/week to over 8,700. The week of March 23 hit 10,952, nearly double the Q4 baseline. MCP servers were the starting point, but the team quickly realized we needed to go further: rethink how standards are codified, how code gets reviewed, how engineers onboard, and how changes propagate across thousands of repos. This post dives deep into what that looked like over the past eleven months and where we ended up. We're publishing now, to close out Agents Week, because the AI engineering stack we built internally runs on the same products we’re shipping and enhancing this week. The architecture at a glance The engineer-facing tools layer OpenCode , Windsurf, and other MCP-compatible clients include both open-source and third-party coding assistant tools. Each layer maps to a Cloudflare product or tool we use: What we built | Built with | |---| Zero Trust authentication | Cloudflare Access | Centralized LLM routing, cost tracking, BYOK, and Zero Data Retention controls | AI Gateway | On-platform inference with open-weight models | Workers AI | MCP Server Portal with single OAuth | Workers + Access | AI Code Reviewer CI integration | Workers + AI Gateway | Sandboxed execution for agent-generated code Code Mode | Dynamic Workers | Stateful, long-running agent sessions | Agents SDK McpAgent, Durable Objects | Isolated environments for cloning, building, and testing | Sandbox SDK — GA as of Agents Week | Durable multi-step workflows | Workflows — scaled 10x during Agents Week | 16K+ entity knowledge graph | Backstage OSS | None of this is internal-only infrastructure. Everything besides Backstage listed above is a shipping product, and many of them got substantial updates during Agents Week. We’ll walk through this in three acts: The platform layer — how authentication, routing, and inference work AI Gateway, Workers AI, MCP Portal, Code Mode The knowledge layer — how agents understand our systems Backstage, AGENTS.md The enforcement layer — how we keep quality high at scale AI Code Reviewer, Engineering Codex How AI Gateway helped us stay secure and improve the developer experience When you have over 3,600+ internal users using AI coding tools daily, you need to solve for access and visibility across many clients, use cases, and roles. Everything starts with Cloudflare Access , which handles all authentication and zero-trust policy enforcement. Once authenticated, every LLM request routes through AI Gateway . This gives us a single place to manage provider keys, cost tracking, and data retention policies. The OpenCode AI Gateway overview: 688.46k requests per day, 10.57B tokens per day, routing to four providers through one endpoint. AI Gateway analytics show how monthly usage is distributed across model providers. Over the last month, internal request volume broke down as follows. Provider | Requests/month | Share | |---| Frontier Labs OpenAI, Anthropic, Google | 13.38M | 91.16% | Workers AI | 1.3M | 8.84% | Frontier models handle the bulk of complex agentic coding work for now, but Workers AI is already a significant part of the mix and handles an increasing share of our agentic engineering workloads. How we increasingly leverage Workers AI Workers AI is Cloudflare's serverless AI inference platform which runs open-source models on GPUs across our global network. Beyond huge cost improvements compared to frontier models, a key advantage is that inference stays on the same network as your Workers, Durable Objects, and storage. No cross-cloud hops to deal with, which cause more latency, network flakiness, and additional networking configuration to manage. Workers AI usage in the last month: 51.47B input tokens, 361.12M output tokens. Kimi K2.5 , launched on Workers AI in March 2026, is a frontier-scale open-source model with a 256k context window, tool calling, and structured outputs. As we described in our Kimi K2.5 launch post , we have a security agent that processes over 7 billion tokens per day on Kimi. That would cost an estimated $2.4M per year on a mid-tier proprietary model. But on Workers AI, it's 77% cheaper. Beyond security, we use Workers AI for documentation review in our CI pipeline, for generating AGENTS.md context files across thousands of repositories, and for lightweight inference tasks where same-network latency matters more than peak model capability. As open-source models continue to improve, we expect Workers AI to handle a growing share of our internal workloads. One thing we got right early: routing through a single proxy Worker from day one. We could have had clients connect directly to AI Gateway, which would have been simpler to set up initially. But centralizing through a Worker meant we could add per-user attribution, model catalog management, and permission enforcement later without touching any client configs. Every feature described in the bootstrap section below exists because we had that single choke point. The proxy pattern gives you a control plane that direct connections don't, and if we plug in additional coding assistant tools later, the same Worker and discovery endpoint will handle them. The entire setup starts with one command: opencode auth login https://opencode.internal.domain That command triggers a chain that configures providers, models, MCP servers, agents, commands, and permissions, without the user touching a config file. Step 1: Discover auth requirements. OpenCode fetches config from a URL like https://opencode.internal.domain/.well-known/opencode . This discovery endpoint is served by a Worker and the response has an auth block telling OpenCode how to authenticate, along with a config block with providers, MCP servers, agents, commands, and default permissions: { "auth": { "command": "cloudflared", "access", "login", "..." , "env": "TOKEN" }, "config": { "provider": { "..." }, "mcp": { "..." }, "agent": { "..." }, "command": { "..." }, "permission": { "..." } } } Step 2: Authenticate via Cloudflare Access. OpenCode runs the auth command and the user authenticates through the same SSO they use for everything else at Cloudflare. cloudflared returns a signed JWT. OpenCode stores it locally and automatically attaches it to every subsequent provider request. Step 3: Config is merged into OpenCode. The config provided is shared defaults for the entire organization, but local configs always take priority. Users can override the default model, add their own agents, or adjust project and user scoped permissions without affecting anyone else. Inside the proxy Worker. The Worker is a simple Hono app that does three things: Serves the shared config. The config is compiled at deploy time from structured source files and contains placeholder values like {baseURL} for the Worker's origin. At request time, the Worker replaces these, so all provider requests route through the Worker rather than directly to model providers. Each provider gets a path prefix /anthropic, /openai, /google-ai-studio/v1beta, /compat for Workers AI that the Worker forwards to the corresponding AI Gateway route. Proxies requests to AI Gateway. When OpenCode sends a request like POST /anthropic/v1/messages , the Worker validates the Cloudflare Access JWT, then rewrites headers before forwarding: Stripped: authorization, cf-access-token, host Added: cf-aig-authorization: Bearer