Claude API Tool Versions: response_inclusion Cuts Agent Bloat Now

Anthropic updated three Claude API tool versions on June 19, introducing a response_inclusion parameter that strips consumed tool results from agent responses to reduce bloat and costs, and a code execution tool version that discloses the 90-second per-cell limit to Claude for better reasoning. The updates are generally available with no beta header required.

On June 19, Anthropic updated three Claude API tool versions in the release notes — no blog post, no announcement. The headline feature is response inclusion , a new parameter in web search 20260318 and web fetch 20260318 that strips consumed tool results from your agent responses. The second is code execution 20260521 , which finally exposes the 90-second per-cell limit in the tool description so Claude can actually reason about it. Both are GA. No beta header required. The Agentic Bloat Problem Here is what happens in a five-step web research agent without this optimization: - Step 1: Claude fetches a webpage — 5,000 tokens of raw HTML land in the result block. - Step 2: Claude processes the result, extracts the key data. - Step 3: Claude searches for related information — another 3,000 tokens. - Step 4: Claude cross-references and synthesizes. - Step 5: Final answer returned to your application. Without response inclusion , your API response carries all of those tool result blocks — the raw HTML, the search payloads — even though Claude already consumed them in steps 2 and 4. You are paying output token rates to transmit data your client never asked for and will throw away. At Claude Sonnet 4.6 output pricing $15 per million tokens , an agent running 1,000 times a day that carries 3,000 unnecessary tokens per run costs an extra $16,000 a year. What response inclusion Does The new parameter tells the API whether to include consumed tool result blocks in the response. When a search or fetch result was consumed by Claude in the same turn — processed, used, done — you can drop it from the response payload entirely. The underlying execution still happened. Claude still saw the result. You just do not carry the raw output forward. Upgrade is a one-line change: Before — result blocks returned in response even after Claude consumed them tools= {"type": "web search 20260209", "name": "web search"} After — consumed result blocks dropped from response tools= { "type": "web search 20260318", "name": "web search", "response inclusion": "none" } The same swap applies to web fetch 20260318 . No beta header. Supported on Claude Fable 5, Opus 4.8, Mythos 5, Opus 4.7, Opus 4.6, and Sonnet 4.6. Check the web search tool documentation https://platform.claude.com/docs/en/agents-and-tools/tool-use/web-search-tool for the full parameter reference. Code Execution: Claude Was Always Flying Blind The code execution tool has always had a 90-second per-cell wall-clock limit. Code that exceeds it returns a detection timeout result. The problem: Claude had no way to know about this limit from the tool spec itself, so it would generate long-running computations and hit the timeout without any model-level awareness that the constraint existed. code execution 20260521 does not change the limit. It discloses it. The tool description Claude reads now states the 90-second constraint explicitly. Claude can now structure multi-step computations across cells to stay within budget, flag operations likely to time out, and reason about execution cost in its planning — rather than writing code and discovering the limit at runtime. See the code execution tool documentation https://platform.claude.com/docs/en/agents-and-tools/tool-use/code-execution-tool for the full updated spec. Upgrade code execution to the version that discloses the limit to Claude tools= {"type": "code execution 20260521", "name": "code execution"} How to Upgrade Today All three versions are GA, no beta header required. To upgrade your agent: - Replace web search 20260209 with web search 20260318 and add "response inclusion": "none" - Replace web fetch 20260209 with web fetch 20260318 and add "response inclusion": "none" - Replace code execution 20260120 with code execution 20260521 The Claude Developer Platform release notes https://platform.claude.com/docs/en/release-notes/overview confirm that dynamic filtering from 20260209 — which cut input tokens by roughly 24% by post-processing search results before they hit the context window — carries forward in 20260318 . You lose nothing, you gain the response optimization. These are not demo-era features anymore. The Anthropic tool layer is being optimized for production economics, version by version: input tokens in February, output tokens in June. If you are running agents in production with Claude with native tools and have not upgraded your tool versions recently, this is worth a fifteen-minute update pass. The broader guide to controlling token costs in Claude agent workflows https://www.mindstudio.ai/blog/how-to-control-token-costs-claude-code-dynamic-workflows is also worth a read if you want to go deeper on context management.