Claude Code v2.1.172: Sub-Agents Can Now Spawn Sub-Agents

Anthropic released Claude Code v2.1.172 on June 10, allowing sub-agents to spawn their own sub-agents up to five levels deep, a change from the previous two-year ban. The feature aims to isolate noisy work from the main context, reducing token overhead, but nesting multiplies token consumption by roughly 7× per branch per level, risking cost explosions. Anthropic recommends setting spend limits and tiering models (Opus, Sonnet, Haiku) across levels to manage expenses.

For two years, Anthropic enforced one rule in Claude Code without exception: a sub-agent cannot spawn another sub-agent. Version 2.1.172, released June 10, quietly ended that. The changelog entry is a single line. The implications for how you architect agentic workflows are not. What Changed in v2.1.172 Sub-agents can now launch their own sub-agents, nesting up to five levels deep. The five-level cap is hard — enforced server-side, with no setting to raise, lower, or disable it. Boris Cherny, who leads Claude Code at Anthropic, described the motivation plainly: “agents kicking off agents as a way to better manage context.” That framing is worth holding onto. The feature is not about running more work in parallel. It is about running the noisy work somewhere the main conversation cannot see it. Before this release, a sub-agent handling log analysis would flood its parent’s context with thousands of raw log tokens. The parent then burned additional tokens re-grounding itself before continuing useful work. Nested sub-agents solve that by isolating the log-reader in its own context frame and returning only a summary upward. The 5-Frame Stack: A Mental Model That Holds Think of nested sub-agents as call-stack recursion with a five-frame depth limit. Each frame carries its own system prompt, its own model selection, and its own 200K token context window. The parent reads only what the leaf returns. Everything in between — the searches, the file reads, the intermediate reasoning — costs tokens and then disappears from the parent’s view. A practical debugging chain looks like this: main session Opus → triage-lead Opus → repro-runner Sonnet → log-summariser Haiku Layer 1 loads the incident. Layer 2 fans out across four candidate causes. The agent investigating log correlations — a noisy, token-heavy task — spawns a Layer 3 Haiku agent to do the grep work and return a one-line result. The main thread sees four clean verdicts, not 50,000 tokens of raw log output. That is the use case: not more agents, but cleaner results from the agents you already have. Most useful chains live at depth 2–3. Five levels is the ceiling, not the target. Token Math: Read This Before You Nest Anything Nesting multiplies token consumption at roughly 7× per branch per level. That is not a rough estimate — it is the observed overhead from agent orchestration, context initialization, and summary-passing between frames. It compounds fast. A developer on the community forums described hitting 887,000 tokens per minute before noticing. A financial services team ran a “simple” code quality project with 23 sub-agents and received a $47,000 invoice three days later https://www.aicosts.ai/blog/claude-code-subagent-cost-explosion-887k-tokens-minute-crisis . The five-level depth cap prevents infinite loops. It does not prevent your billing alert from triggering at 3 AM. Set a spend limit before nesting anything in production. Claude Code’s billing settings support per-session caps. Use them. How to Configure Nested Sub-Agents Sub-agent definitions live in .claude/agents/<name .md at the project level or ~/.claude/agents/<name .md for user scope. The Agent field in tools is new as of this release — it is the allowlist controlling which sub-agent types this agent is permitted to spawn. --- name: triage-lead model: claude-opus-4-6 tools: - Read - Bash - Agent repro-runner, log-summariser --- You are a debugging triage lead. Load the incident, delegate to repro-runner for reproduction, delegate to log-summariser for log correlation, and return a single verdict: confirmed, unconfirmed, or needs-more-data. Model tiering across levels is not optional if cost matters. Run Opus at the orchestration layer, Sonnet for mid-level implementation work, and Haiku for leaf tasks like log reads, grep, and test generation. The official sub-agents documentation https://code.claude.com/docs/en/sub-agents covers the full configuration spec. In practice, tiered routing costs roughly $0.98 per session versus $2.02 for uniform Opus — a 51% reduction with no quality loss on leaf tasks. Three Pitfalls to Avoid Token explosion via circular spawning. The depth cap prevents infinite nesting, not circular logic. One production case: a triage agent spawned a “general researcher” that spawned an “investigation specialist” that spawned a “log reader” that spawned a general researcher again. Four levels deep, circular, zero useful output, substantial cost. Use explicit Agent allowlists. If the model cannot complete the task with the specified children, fix the task description — do not widen the allowlist. Silent failures at depth five. When a Level-5 agent attempts to spawn a Level-6, it receives an error. Without explicit handling, that error propagates silently or surfaces as unexpected output at the top level. Test your nesting depth explicitly with a canary chain before deploying complex hierarchies in production. Nesting tasks that are already short. The per-branch overhead makes nesting expensive for any task producing less than roughly 1,000 tokens of output. If the child task is short, do it inline. Nesting a ten-line grep into its own context frame costs more in orchestration overhead than the isolation is worth. What to Do Now Update to v2.1.172 via npm install -g @anthropic-ai/claude-code . The full release notes on GitHub https://github.com/anthropics/claude-code/releases/tag/v2.1.172 cover all 30 changes, including Bedrock region resolution improvements and idle CPU reduction — both worth reading. The DevelopersIO breakdown https://dev.classmethod.jp/en/articles/20260611-cc-updates-v2-1-172/ has the most detailed technical analysis of the infrastructure changes that made nested sub-agents possible. On nested sub-agents: the feature is well-suited for workflows where context pollution is the bottleneck, not throughput. If your main session is losing focus because one agent is drowning it in raw data, nesting is the right solution. If you are thinking about nesting to run more tasks in parallel, that is flat parallel execution — a different feature. Depth 2–3 is where this feature earns its overhead. Five levels is there when you need it. Most teams will not need it for a while.