Cutting Claude API Costs in Half with a 3-Tier Routing System (Haiku/Sonnet/Opus)

A developer building an ad analytics SaaS reduced Claude API costs from $180-200/month to $95-110/month by implementing a three-tier routing system that assigns tasks to Haiku, Sonnet, or Opus based on context length rather than task complexity. The system uses Haiku to classify incoming tasks in ~100 tokens, with an 8% retry rate when Haiku falls back to Sonnet. The key insight was that context length under 2,000 tokens allows Haiku to handle complex tasks, while context over 5,000 tokens degrades performance regardless of task difficulty.

Adding more Claude subagents made my pipeline slower past 6 — but the real problem wasn't concurrency at all. When I finally looked at the cost logs for my ad analytics SaaS, every task was hitting Sonnet: renaming files, formatting Slack messages, parsing JSON, and interpreting 12-campaign performance reports. All the same model. Sonnet 4.5 runs $3/M input and $15/M output tokens. Haiku 3.5 is $0.80/$4. Same tokens, 3-4x cost difference based purely on model choice. I split tasks into three tiers — Haiku for format/parse/extract work with no judgment needed, Sonnet for pattern recognition and multi-step tool use, Opus for architectural decisions currently one worker out of twelve, run manually . The routing decision itself is made by Haiku classifying the incoming task in ~100 tokens, which costs roughly $0.00008 per call — noise compared to the savings from avoiding a wrong-model assignment. The counter-intuitive finding: task complexity mattered less than context length. I expected complex tasks to need Sonnet. What I actually found was that Haiku handled surprisingly hard work just fine when context was compressed under 2,000 tokens — and fell apart on simple tasks when context ballooned past 5,000. So context length is now the first branch in my router, not task type. js const modelMap: Record<Tier, string = { 1: "claude-haiku-3-5", 2: "claude-sonnet-4-5", 3: "claude-opus-4", }; After six months in production: API spend dropped from $180-200/month to $95-110. Not a clean 50% cut — Haiku retries about 8% of calls fall back to Sonnet eat into it. But even counting retry costs, the routing system pays for itself many times over. Trying to get retry rate to 0% by defaulting everything to Sonnet would cost more than tolerating the 8%. I also hit a D1 too many variables error three days after deploy — batching 100 routing log rows at 7 columns each blew past SQLite's 999-variable limit. Dropping batch size to 30 fixed it. Not a routing problem, just a logging assumption that didn't survive contact with reality. The full breakdown — including the rule-based pre-filter I'm testing to skip the LLM routing call entirely for 90% of tasks, and the open question of when Opus actually justifies pipeline inclusion — is over on riversealab.