Uber Burned Through Its Entire AI Coding Budget in 4 Months. Here's What Smart Teams Do Instead. A developer reports that AI coding costs can be slashed by 70% through task-level model routing instead of using a single frontier model for all tasks. After profiling usage, the developer found that 60% of prompts to expensive models like Claude Opus could be handled by cheaper alternatives like Haiku or Gemini Flash. Implementing a three-tier routing system reduced monthly spend from $10,000 to $3,000 without sacrificing output quality. The AI coding bill just became everyone's problem. In the last two weeks alone: The pattern is clear: agentic workflows burn tokens faster than any flat budget anticipated. And single-vendor lock-in makes it worse — when your only option is Opus 4.8 at $75/M output tokens, every wasted thinking loop is expensive. Here's what I learned after watching my own AI coding spend hit $10K/month earlier this year. I was sending everything to Claude Opus. Code planning? Opus. Writing unit tests? Opus. Formatting a config file? Opus. Renaming a variable across three files? Opus. That's like hiring a senior architect to move furniture. The work gets done, but you're massively overpaying. When I actually profiled my usage, the breakdown looked like this: That 60% was burning frontier-tier tokens for work that Haiku, Gemini Flash, or even a local model could handle identically. The concept is simple: instead of routing every request to one model, classify each task and send it to the cheapest model that can handle it well. Planning phase → Frontier model Opus, GPT-5 . This is where reasoning depth matters. You want the model that catches edge cases your spec missed. Implementation → Mid-tier model Sonnet, GPT-4.1 . Given a clear plan, most code generation doesn't need maximum intelligence — it needs reliable instruction-following. Tests, formatting, docs → Fast/cheap model Haiku, Flash, Gemini 2.5 . These tasks have objectively verifiable outputs. Either the test passes or it doesn't. You don't need 200 IQ for assertEqual . Debug/diagnosis → Frontier model again. When something breaks in a non-obvious way, you want the best reasoning available. After implementing this approach, my monthly spend dropped from ~$10K to ~$3K. Same output quality. Same velocity. Just stopped overpaying for routine work. You don't need custom infrastructure. Here's the practical version: Before optimizing, know where your tokens go. Log the actual prompts hitting the API for a week. You'll probably find: Start simple — three tiers is enough: The key insight: routing should happen at the task level, not the session level. A single coding session might need Opus for the initial design, Sonnet for implementation, and Haiku for writing tests — all within the same workflow. Most teams I've talked to start with manual routing just switching models themselves and then automate it once they see the pattern. Track cost-per-task, not just total spend. When you see a Tier 3 task consuming $2 worth of tokens on a frontier model, that's a routing failure. When a Tier 1 task fails on a cheap model, that's also a routing failure. The sweet spot is in the middle. Ramp's data tells an interesting story: the companies spending the most on AI aren't the ones in trouble. The ones in trouble are companies locked into a single vendor with no ability to route. "The top 1% of firms tend to mix and match, bouncing between multiple frontier models and platforms that give them access to cheaper models." — Ramp AI Index This isn't about spending less on AI. It's about spending smarter . The teams that figure out task-level routing now will have a structural cost advantage as agentic workflows become the default. The $10K/month developer AI bill is already here. The question is whether you're paying it because you need to, or because you never bothered to check which tasks actually require the expensive model. I've been building apps with AI coding tools for the past year and tracking the economics obsessively. Happy to share specific numbers or discuss routing strategies in the comments.