cd /news/ai-tools/claude-api-tool-versions-response-in… · home topics ai-tools article
[ARTICLE · art-34912] src=byteiota.com ↗ pub= topic=ai-tools verified=true sentiment=↑ positive

Claude API Tool Versions: response_inclusion Cuts Agent Bloat Now

Anthropic updated three Claude API tool versions on June 19, introducing a response_inclusion parameter that strips consumed tool results from agent responses to reduce bloat and costs, and a code execution tool version that discloses the 90-second per-cell limit to Claude for better reasoning. The updates are generally available with no beta header required.

read3 min views1 publishedJun 20, 2026
Claude API Tool Versions: response_inclusion Cuts Agent Bloat Now
Image: Byteiota (auto-discovered)

On June 19, Anthropic updated three Claude API tool versions in the release notes — no blog post, no announcement. The headline feature is response_inclusion

, a new parameter in web_search_20260318

and web_fetch_20260318

that strips consumed tool results from your agent responses. The second is code_execution_20260521

, which finally exposes the 90-second per-cell limit in the tool description so Claude can actually reason about it. Both are GA. No beta header required.

The Agentic Bloat Problem #

Here is what happens in a five-step web research agent without this optimization:

  • Step 1: Claude fetches a webpage — 5,000 tokens of raw HTML land in the result block.
  • Step 2: Claude processes the result, extracts the key data.
  • Step 3: Claude searches for related information — another 3,000 tokens.
  • Step 4: Claude cross-references and synthesizes.
  • Step 5: Final answer returned to your application.

Without response_inclusion

, your API response carries all of those tool result blocks — the raw HTML, the search payloads — even though Claude already consumed them in steps 2 and 4. You are paying output token rates to transmit data your client never asked for and will throw away. At Claude Sonnet 4.6 output pricing ($15 per million tokens), an agent running 1,000 times a day that carries 3,000 unnecessary tokens per run costs an extra $16,000 a year.

What response_inclusion Does #

The new parameter tells the API whether to include consumed tool result blocks in the response. When a search or fetch result was consumed by Claude in the same turn — processed, used, done — you can drop it from the response payload entirely. The underlying execution still happened. Claude still saw the result. You just do not carry the raw output forward.

Upgrade is a one-line change:

tools=[{"type": "web_search_20260209", "name": "web_search"}]

tools=[{
    "type": "web_search_20260318",
    "name": "web_search",
    "response_inclusion": "none"
}]

The same swap applies to web_fetch_20260318

. No beta header. Supported on Claude Fable 5, Opus 4.8, Mythos 5, Opus 4.7, Opus 4.6, and Sonnet 4.6. Check the web search tool documentation for the full parameter reference.

Code Execution: Claude Was Always Flying Blind #

The code execution tool has always had a 90-second per-cell wall-clock limit. Code that exceeds it returns a detection_timeout

result. The problem: Claude had no way to know about this limit from the tool spec itself, so it would generate long-running computations and hit the timeout without any model-level awareness that the constraint existed.

code_execution_20260521

does not change the limit. It discloses it. The tool description Claude reads now states the 90-second constraint explicitly. Claude can now structure multi-step computations across cells to stay within budget, flag operations likely to time out, and reason about execution cost in its planning — rather than writing code and discovering the limit at runtime. See the code execution tool documentation for the full updated spec.

tools=[{"type": "code_execution_20260521", "name": "code_execution"}]

How to Upgrade Today #

All three versions are GA, no beta header required. To upgrade your agent:

  • Replace web_search_20260209

withweb_search_20260318

and add"response_inclusion": "none"

  • Replace web_fetch_20260209

withweb_fetch_20260318

and add"response_inclusion": "none"

  • Replace code_execution_20260120

withcode_execution_20260521

The Claude Developer Platform release notes confirm that dynamic filtering from 20260209

— which cut input tokens by roughly 24% by post-processing search results before they hit the context window — carries forward in 20260318

. You lose nothing, you gain the response optimization.

These are not demo-era features anymore. The Anthropic tool layer is being optimized for production economics, version by version: input tokens in February, output tokens in June. If you are running agents in production with Claude with native tools and have not upgraded your tool versions recently, this is worth a fifteen-minute update pass. The broader guide to controlling token costs in Claude agent workflows is also worth a read if you want to go deeper on context management.

── more in #ai-tools 4 stories · sorted by recency
── more on @anthropic 3 stories trending now
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/claude-api-tool-vers…] indexed:0 read:3min 2026-06-20 ·