{"slug": "more-parallel-subagents-made-my-pipeline-slower-here-s-the-data", "title": "More parallel subagents made my pipeline slower. Here's the data.", "summary": "A developer found that adding more parallel subagents to an ad-creative analysis pipeline increased latency rather than decreasing it, because context assembly before the LLM call became the bottleneck. With 8 subagents, aggregation consumed 61% of wall-clock time, and serializing JSON blobs took over 4 seconds. The fix was to have subagents write summaries to R2, reducing aggregation context from ~6,400 to ~1,100 tokens and cutting monthly costs from $207 to $38.", "body_md": "Adding a 7th subagent pushed my orchestrator latency from 22s to 31s — the opposite of what I expected.\n\nI'd been running a fanout pattern in my ad-creative analysis SaaS: spawn N subagents in parallel, collect results, merge into one verdict. The parallel part worked fine. Individual subagents finished in 9–12 seconds regardless of how many I spawned. The problem was everything after that.\n\nWith 8 subagents, each returning ~800 tokens of analysis, the orchestrator was assembling a 6,400-token context before it could even call the LLM once. On Cloudflare Workers, serializing 8 JSON blobs into a single prompt string was taking **4+ seconds of pure CPU time** before the first API call fired. The log entry that made it obvious:\n\n```\n[worker:orchestrator] WARN\n  aggregate_context_size=52480 bytes\n  serialize_duration=4312ms\n  reason=\"context_assembly_backpressure\"\n```\n\nMeasured across 3 weeks of production data:\n\n| Subagents | Total latency | Aggregation share |\n|---|---|---|\n| 2 | 14.2s | 18% |\n| 4 | 16.8s | 31% |\n| 6 | 22.4s | 47% |\n| 8 | 31.1s | 61% |\n\nAt 6+ subagents, aggregation consumed more than half the wall-clock time. The fanout was fast. The funnel was the bottleneck.\n\nThe fix wasn't reducing parallelism — it was changing what the orchestrator actually reads. Instead of passing full results to the aggregation LLM call, each subagent now writes to R2 on completion. The orchestrator pulls only a three-field summary struct per agent (`verdict`\n\n, `confidence`\n\n, `top_signal`\n\n). Eight agents still produce eight files, but the aggregation context dropped from ~6,400 tokens to ~1,100. Monthly cost for that one pipeline step: $207 → $38.\n\nThe counterintuitive part: the bottleneck wasn't the LLM. It was the context assembly happening before the LLM even got called.\n\nI wrote up the full breakdown — including the R2 chunking pattern, the D1 counter approach for tracking partial completions without polling, and the KV-based loop guard for failed aggregation retries — over on riversealab.com.", "url": "https://wpnews.pro/news/more-parallel-subagents-made-my-pipeline-slower-here-s-the-data", "canonical_source": "https://dev.to/riversea/more-parallel-subagents-made-my-pipeline-slower-heres-the-data-4fic", "published_at": "2026-06-17 06:16:43+00:00", "updated_at": "2026-06-17 06:21:22.318108+00:00", "lang": "en", "topics": ["ai-agents", "ai-infrastructure", "developer-tools"], "entities": ["Cloudflare Workers", "R2", "LLM"], "alternates": {"html": "https://wpnews.pro/news/more-parallel-subagents-made-my-pipeline-slower-here-s-the-data", "markdown": "https://wpnews.pro/news/more-parallel-subagents-made-my-pipeline-slower-here-s-the-data.md", "text": "https://wpnews.pro/news/more-parallel-subagents-made-my-pipeline-slower-here-s-the-data.txt", "jsonld": "https://wpnews.pro/news/more-parallel-subagents-made-my-pipeline-slower-here-s-the-data.jsonld"}}