Claude Code v2.1.172: Sub-Agents Can Now Spawn Sub-Agents Anthropic released Claude Code v2.1.172 on June 10, allowing sub-agents to spawn their own sub-agents up to five levels deep, a change from the previous two-year ban. The feature aims to isolate noisy work from the main context, reducing token overhead, but nesting multiplies token consumption by roughly 7× per branch per level, risking cost explosions. Anthropic recommends setting spend limits and tiering models (Opus, Sonnet, Haiku) across levels to manage expenses. For two years, Anthropic enforced one rule in Claude Code without exception: a sub-agent cannot spawn another sub-agent. Version 2.1.172, released June 10, quietly ended that. The changelog entry is a single line. The implications for how you architect agentic workflows are not. What Changed in v2.1.172 Sub-agents can now launch their own sub-agents, nesting up to five levels deep. The five-level cap is hard — enforced server-side, with no setting to raise, lower, or disable it. Boris Cherny, who leads Claude Code at Anthropic, described the motivation plainly: “agents kicking off agents as a way to better manage context.” That framing is worth holding onto. The feature is not about running more work in parallel. It is about running the noisy work somewhere the main conversation cannot see it. Before this release, a sub-agent handling log analysis would flood its parent’s context with thousands of raw log tokens. The parent then burned additional tokens re-grounding itself before continuing useful work. Nested sub-agents solve that by isolating the log-reader in its own context frame and returning only a summary upward. The 5-Frame Stack: A Mental Model That Holds Think of nested sub-agents as call-stack recursion with a five-frame depth limit. Each frame carries its own system prompt, its own model selection, and its own 200K token context window. The parent reads only what the leaf returns. Everything in between — the searches, the file reads, the intermediate reasoning — costs tokens and then disappears from the parent’s view. A practical debugging chain looks like this: main session Opus → triage-lead Opus → repro-runner Sonnet → log-summariser Haiku Layer 1 loads the incident. Layer 2 fans out across four candidate causes. The agent investigating log correlations — a noisy, token-heavy task — spawns a Layer 3 Haiku agent to do the grep work and return a one-line result. The main thread sees four clean verdicts, not 50,000 tokens of raw log output. That is the use case: not more agents, but cleaner results from the agents you already have. Most useful chains live at depth 2–3. Five levels is the ceiling, not the target. Token Math: Read This Before You Nest Anything Nesting multiplies token consumption at roughly 7× per branch per level. That is not a rough estimate — it is the observed overhead from agent orchestration, context initialization, and summary-passing between frames. It compounds fast. A developer on the community forums described hitting 887,000 tokens per minute before noticing. A financial services team ran a “simple” code quality project with 23 sub-agents and received a $47,000 invoice three days later https://www.aicosts.ai/blog/claude-code-subagent-cost-explosion-887k-tokens-minute-crisis . The five-level depth cap prevents infinite loops. It does not prevent your billing alert from triggering at 3 AM. Set a spend limit before nesting anything in production. Claude Code’s billing settings support per-session caps. Use them. How to Configure Nested Sub-Agents Sub-agent definitions live in .claude/agents/