Adding more Claude subagents made my pipeline slower past 6 — but the real problem wasn't concurrency at all.
When I finally looked at the cost logs for my ad analytics SaaS, every task was hitting Sonnet: renaming files, formatting Slack messages, parsing JSON, and interpreting 12-campaign performance reports. All the same model. Sonnet 4.5 runs $3/M input and $15/M output tokens. Haiku 3.5 is $0.80/$4. Same tokens, 3-4x cost difference based purely on model choice.
I split tasks into three tiers — Haiku for format/parse/extract work with no judgment needed, Sonnet for pattern recognition and multi-step tool use, Opus for architectural decisions (currently one worker out of twelve, run manually). The routing decision itself is made by Haiku classifying the incoming task in ~100 tokens, which costs roughly $0.00008 per call — noise compared to the savings from avoiding a wrong-model assignment.
The counter-intuitive finding: task complexity mattered less than context length. I expected complex tasks to need Sonnet. What I actually found was that Haiku handled surprisingly hard work just fine when context was compressed under 2,000 tokens — and fell apart on simple tasks when context ballooned past 5,000. So context length is now the first branch in my router, not task type.
const modelMap: Record<Tier, string> = {
1: "claude-haiku-3-5",
2: "claude-sonnet-4-5",
3: "claude-opus-4",
};
After six months in production: API spend dropped from $180-200/month to $95-110. Not a clean 50% cut — Haiku retries (about 8% of calls fall back to Sonnet) eat into it. But even counting retry costs, the routing system pays for itself many times over. Trying to get retry rate to 0% by defaulting everything to Sonnet would cost more than tolerating the 8%.
I also hit a D1 too many variables
error three days after deploy — batching 100 routing log rows at 7 columns each blew past SQLite's 999-variable limit. Dropping batch size to 30 fixed it. Not a routing problem, just a logging assumption that didn't survive contact with reality.
The full breakdown — including the rule-based pre-filter I'm testing to skip the LLM routing call entirely for 90% of tasks, and the open question of when Opus actually justifies pipeline inclusion — is over on riversealab.