Date: 2026-05-30 • Verified clean build: 2.1.153 • Broken builds: 2.1.154–2.1.158 (clean 2.1.157-good / 2.1.158-bad bisect from another reporter; symptoms predate 2.1.158)
CC 2.1.154–2.1.158 corrupts the path that delivers tool results back to the
model. Execution is clean — commands run exactly once and disk state is
correct. The channel that returns results is what's broken: results arrive
empty-then-late in bursts, the tail of a parallel batch renders ~15×
duplicated, Read
sometimes returns phantom content (a 76-line file reported as 5333 lines, with a row that wasn't in the file), and results flush out of order.
Downstream: burns context + output tokens, invalidates prompt-cache hits, and provokes model-side amplification — busy-wait polling and fabricated results that drive retry cascades.
Suspect (temporal, not maintainer-confirmed): the 2.1.154 changelog entry "Streaming tool execution is now always enabled, including when telemetry is disabled or on Bedrock/Vertex/Foundry (previously behind a feature flag)" lines up exactly with when the cross-platform cluster ignites.
The trick: append a label to a log file before echoing. The log is ground truth of what executed (immune to the delivery channel); stdout is what the delivery channel returned. Comparing the two cleanly separates "did the command run?" from "did the result arrive intact?"
#!/usr/bin/env bash
log=/tmp/toolprobe/log.txt
echo "$(date +%s.%N) $1" >> "$log"
echo "PROBE_OK label=$1 count=$(wc -l < "$log")"
Run N labels as a single parallel Bash batch, then compare per-label
execution count (lines in log.txt
) vs inline result blocks received.
The probe above was built because the 2.1.158 session corrupted tool-result delivery mid-task. So this side is qualitative — what the original broken session produced before the probe existed:
Tail of a parallel Bash batch rendered ~15× duplicated in the inline result blocks the model received.Empty-then-burst arrival lagged ~1 turn— multiple parallel calls returned blank, then their content flushed together later.including a fabricated duplicate row not in the source file.Read
of a 76-line file reported 5333 lineswc -l
on the same file from a sibling Bash call returned 76.- Cross-checking ground truth (a log file appended to from inside each command) confirmed every command executed exactly once. Only the channel handing results back to the model was corrupted.
The reason there isn't a structured 2.1.158 probe run below: surviving a corrupted-delivery session well enough to drive a structured probe is itself hard — the corruption shreds the harness state you'd use to script it. The clean 2.1.153 run below was the first opportunity to instrument cleanly.
claude --version: 2.1.153 (Claude Code)
date: Sat May 30 19:46:13 CDT 2026
PROBE_OK label=R1 count=1
Call A → PROBE_OK label=A count=2
Call B → PROBE_OK label=B count=3
Call C → PROBE_OK label=C count=4
Call D → PROBE_OK label=D count=5
Call E → PROBE_OK label=E count=6
Call F → PROBE_OK label=F count=7
Call G → PROBE_OK label=G count=8
Call H → PROBE_OK label=H count=9
(Every label's count=
value was strictly monotonic 1..25 across all three parallel batches — no two parallel calls reported the same count, no gaps, no dupes.)
seq 1 500 > /tmp/toolprobe/big.txt
Read tool on big.txt
: reported 501 numbered rows. Rows 1..500 contained
the exact values "1".."500"; row 501 was empty (trailing-newline artifact —
wc -l
reports 500, matching). No phantom rows, no fabrication.
$ wc -l /tmp/toolprobe/log.txt
25 /tmp/toolprobe/log.txt
$ awk '{print $2}' /tmp/toolprobe/log.txt | sort | uniq -c
1 A 1 B 1 C 1 D 1 E 1 F 1 G 1 H
1 I 1 J 1 K 1 L 1 M 1 N 1 O 1 P
1 Q 1 R 1 S 1 T 1 U 1 V 1 W 1 X
1 R1
| Label | exec count (log) | inline result blocks received |
|---|---|---|
| R1, A–X (25 total) | 1 each | 1 each |
Distinct labels: 25. Max count any label reached: 1. Dupes: 0. Empties: 0. Cancels: 0. Read fidelity exact.
n=1 — the bug is intermittent, so one clean pass on 2.1.153 isn't absolute proof of absence. But it converges with the changelog-timed hypothesis and with upstream's 2.1.157-good / 2.1.158-bad bisect.
— Linux/WSL primary.#63797Reproduced fresh/short session after reboot, which empirically rules out the naive 64KB-pipe-buffer theory and points at the streaming pipeline itself.— macOS, fullest forensics.#6396688/88 Proves delivery lag, not execution loss. Onetool_use
↔tool_result
paired in JSONL (0 orphans).Read
had ~15 intervening tool uses before its result.— bisect tripwire. Clean#639352.1.157-good / 2.1.158-bad;/clear
does not fix; downgrade to 2.1.157 resolves the worst form. We target 2.1.153 not 2.1.157 because streaming-always-on dates to 2.1.154.— multi-minute buffering → model busy-waits with ~500 no-op poll commands.#64077— out-of-order results with stale batch data when a parallel batch partially fails.#63859/#63538/#63884/#64065— model-behavior facet: when a batch looks empty/cancelled, the model fabricates output (once even a fake user instruction).#64076
Year-long lineage of the same delivery-not-execution class: ** #13984** (2025-12) →
(canonical precedent, closed not_planned) → the current 2.1.154+ cluster.
#36038Clean window: Agent View requires ≥2.1.139; regression starts at 2.1.154. So 2.1.153 keeps Agent View and predates the bug.
Disable the auto-updater FIRST(else the supervisor binary-watch re-bumps you). Add to~/.claude/settings.json
:There is
{ "env": { "DISABLE_AUTOUPDATER": "1" } }
noautoUpdates
settings key —DISABLE_AUTOUPDATER
is an env var insidesettings.json
. Donot useminimumVersion
— it sets a floor that would block the downgrade.Downgrade with the built-in installer (no npm):
claude install 2.1.153
Restart.claude respawn --all
for background sessions. Verifyclaude --version
→ 2.1.153.
Rollback: claude install 2.1.158
- remove
DISABLE_AUTOUPDATER
.
This is non-trivial — Anthropic shipped Opus 4.8 in the same release that broke delivery. From the official CC changelog, here is everything 2.1.154–2.1.158 added that you lose by pinning to 2.1.153:
Opus 4.8—*"Opus 4.8 is here! Now defaults to high effort ·*Pinning means your top model is Opus 4.7./effort xhigh
for your hardest tasks."Dynamic workflows—/workflows
orchestrates work across tens to hundreds of agents in the background.- Fast mode on Opus 4.8 at
"a fraction of its previous cost: 2x the standard rate for 2.5x the speed." - Lean system prompt as default for all models except Haiku, Sonnet, and Opus 4.7 and earlier.
/simplify
runs a cleanup-only review (reuse/simplification/efficiency) instead of full/code-review --fix
.claude agents
:! <command>
to run a shell command as a background session you can attach to / detach from.- Improved auto-mode classifier detection of bulk-repo data-exfil patterns.
- Claude Opus 4.8 support added to the
/claude-api
skill.
-
Fix for Opus 4.8 thinking-block modification causing API errors.
-
Plugins in
.claude/skills/
auto-load (no marketplace required). claude plugin init <name>
to scaffold a new plugin in.claude/skills
./plugin
autocomplete for subcommands, installed plugin names, and marketplaces.claude agents
:agent
field insettings.json
honored for dispatched sessions,--agent <name>
to override.EnterWorktree
can switch between Claude-managed worktrees mid-session.- Several IDE / VSCode / WSL fixes (image paste on WSL,
--resume
background-subagent reporting, etc.).
- Auto mode on Bedrock/Vertex/Foundry for Opus 4.7 + 4.8 via
CLAUDE_CODE_ENABLE_AUTO_MODE=1
.
Pin to 2.1.153 if reliable tool-result delivery matters more than Opus 4.8 + dynamic workflows for your work.Stay on 2.1.158 if Opus 4.8 or dynamic workflows are load-bearing for you, and ride the mitigations checklist below. The bug is real on that path; you'd be managing it, not avoiding it.
Redirect to file, then Read.cmd > /tmp/out 2>&1
then read the file. Most-cited workaround (history of the same class: #27004, #63966).Defeat block-buffering when piping.stdbuf -oL
/-o0
,unbuffer
,grep --line-buffered
,PYTHONUNBUFFERED=1
, or a PTY.Keep stdout small.BASH_MAX_OUTPUT_LENGTH
.Don't poll a pending result. Polling can't pull a buffered result through; it only burns turns (#64077).Don't fabricate to fill a gap. Empty = "delivery lag, re-verify", never confabulate (this is what produces the model-side fabrication cascade).Fewer, fatter tool calls. Burst delivery hurts wide parallel fan-outs more than serial ones.Operational palliatives:/compact
, restart between tasks, move heavy work to a subagent (fresh subprocess often works when the main loop is wedged).
CLAUDE_CODE_DISABLE_NONSTREAMING_FALLBACK=1
— Anthropic's own env-var docs say the streaming→non-streaming fallback can "produce duplicate tool execution." Kills the duplicating fallback but not the underlying pipeline.CLAUDE_CODE_ENABLE_FINE_GRAINED_TOOL_STREAMING=0
— input-side only; doesn't address tool-result delivery./clear
— does not fix (#63935).- "Wait it out" — execution finished; the channel won't unstick on its own.
- The "streaming-always-on @ 2.1.154 ↔ regression cluster" link is
temporal/circumstantial, not maintainer-confirmed. No maintainer reply on the cluster issues as of writing. - Pure 64KB-pipe-buffer exhaustion was
empirically contradicted as the cluster trigger (fresh/short sessions repro per #63797). Still explains older "worsens as session grows" reports like #36038. - Env vars seen only in third-party blog lists (
CLAUDE_CODE_EAGER_FLUSH
,CLAUDE_STREAM_IDLE_TIMEOUT_MS
, etc.) arenot on the official env-vars page — don't rely on them unverified. - Latest CC at write-time = 2.1.158; its changelog has** no**entry addressing tool-result delivery/buffering.