Cutting Claude API Costs in Half with a 3-Tier Routing System (Haiku/Sonnet/Opus) A developer building an ad analytics SaaS reduced Claude API costs from $180-200/month to $95-110/month by implementing a three-tier routing system that assigns tasks to Haiku, Sonnet, or Opus based on context length rather than task complexity. The system uses Haiku to classify incoming tasks in ~100 tokens, with an 8% retry rate when Haiku falls back to Sonnet. The key insight was that context length under 2,000 tokens allows Haiku to handle complex tasks, while context over 5,000 tokens degrades performance regardless of task difficulty. Adding more Claude subagents made my pipeline slower past 6 — but the real problem wasn't concurrency at all. When I finally looked at the cost logs for my ad analytics SaaS, every task was hitting Sonnet: renaming files, formatting Slack messages, parsing JSON, and interpreting 12-campaign performance reports. All the same model. Sonnet 4.5 runs $3/M input and $15/M output tokens. Haiku 3.5 is $0.80/$4. Same tokens, 3-4x cost difference based purely on model choice. I split tasks into three tiers — Haiku for format/parse/extract work with no judgment needed, Sonnet for pattern recognition and multi-step tool use, Opus for architectural decisions currently one worker out of twelve, run manually . The routing decision itself is made by Haiku classifying the incoming task in ~100 tokens, which costs roughly $0.00008 per call — noise compared to the savings from avoiding a wrong-model assignment. The counter-intuitive finding: task complexity mattered less than context length. I expected complex tasks to need Sonnet. What I actually found was that Haiku handled surprisingly hard work just fine when context was compressed under 2,000 tokens — and fell apart on simple tasks when context ballooned past 5,000. So context length is now the first branch in my router, not task type. js const modelMap: Record