# Claude API Tool Versions: response_inclusion Cuts Agent Bloat Now

> Source: <https://byteiota.com/claude-api-tool-versions-response_inclusion-cuts-agent-bloat-now/>
> Published: 2026-06-20 14:23:17+00:00

On June 19, Anthropic updated three Claude API tool versions in the release notes — no blog post, no announcement. The headline feature is `response_inclusion`

, a new parameter in `web_search_20260318`

and `web_fetch_20260318`

that strips consumed tool results from your agent responses. The second is `code_execution_20260521`

, which finally exposes the 90-second per-cell limit in the tool description so Claude can actually reason about it. Both are GA. No beta header required.

## The Agentic Bloat Problem

Here is what happens in a five-step web research agent without this optimization:

- Step 1: Claude fetches a webpage — 5,000 tokens of raw HTML land in the result block.
- Step 2: Claude processes the result, extracts the key data.
- Step 3: Claude searches for related information — another 3,000 tokens.
- Step 4: Claude cross-references and synthesizes.
- Step 5: Final answer returned to your application.

Without `response_inclusion`

, your API response carries all of those tool result blocks — the raw HTML, the search payloads — even though Claude already consumed them in steps 2 and 4. You are paying output token rates to transmit data your client never asked for and will throw away. At Claude Sonnet 4.6 output pricing ($15 per million tokens), an agent running 1,000 times a day that carries 3,000 unnecessary tokens per run costs an extra $16,000 a year.

## What response_inclusion Does

The new parameter tells the API whether to include consumed tool result blocks in the response. When a search or fetch result was consumed by Claude in the same turn — processed, used, done — you can drop it from the response payload entirely. The underlying execution still happened. Claude still saw the result. You just do not carry the raw output forward.

Upgrade is a one-line change:

```
# Before — result blocks returned in response even after Claude consumed them
tools=[{"type": "web_search_20260209", "name": "web_search"}]

# After — consumed result blocks dropped from response
tools=[{
    "type": "web_search_20260318",
    "name": "web_search",
    "response_inclusion": "none"
}]
```

The same swap applies to `web_fetch_20260318`

. No beta header. Supported on Claude Fable 5, Opus 4.8, Mythos 5, Opus 4.7, Opus 4.6, and Sonnet 4.6. Check the [web search tool documentation](https://platform.claude.com/docs/en/agents-and-tools/tool-use/web-search-tool) for the full parameter reference.

## Code Execution: Claude Was Always Flying Blind

The code execution tool has always had a 90-second per-cell wall-clock limit. Code that exceeds it returns a `detection_timeout`

result. The problem: Claude had no way to know about this limit from the tool spec itself, so it would generate long-running computations and hit the timeout without any model-level awareness that the constraint existed.

`code_execution_20260521`

does not change the limit. It discloses it. The tool description Claude reads now states the 90-second constraint explicitly. Claude can now structure multi-step computations across cells to stay within budget, flag operations likely to time out, and reason about execution cost in its planning — rather than writing code and discovering the limit at runtime. See the [code execution tool documentation](https://platform.claude.com/docs/en/agents-and-tools/tool-use/code-execution-tool) for the full updated spec.

```
# Upgrade code execution to the version that discloses the limit to Claude
tools=[{"type": "code_execution_20260521", "name": "code_execution"}]
```

## How to Upgrade Today

All three versions are GA, no beta header required. To upgrade your agent:

- Replace
`web_search_20260209`

with`web_search_20260318`

and add`"response_inclusion": "none"`

- Replace
`web_fetch_20260209`

with`web_fetch_20260318`

and add`"response_inclusion": "none"`

- Replace
`code_execution_20260120`

with`code_execution_20260521`

The [Claude Developer Platform release notes](https://platform.claude.com/docs/en/release-notes/overview) confirm that dynamic filtering from `20260209`

— which cut input tokens by roughly 24% by post-processing search results before they hit the context window — carries forward in `20260318`

. You lose nothing, you gain the response optimization.

These are not demo-era features anymore. The Anthropic tool layer is being optimized for production economics, version by version: input tokens in February, output tokens in June. If you are running agents in production with Claude with native tools and have not upgraded your tool versions recently, this is worth a fifteen-minute update pass. The [broader guide to controlling token costs in Claude agent workflows](https://www.mindstudio.ai/blog/how-to-control-token-costs-claude-code-dynamic-workflows) is also worth a read if you want to go deeper on context management.
