# Claude Code 2.1.154–2.1.158 streaming tool-result delivery regression — probe + side-by-side evidence + workaround (and what you give up by pinning to 2.1.153)

> Source: <https://gist.github.com/0xdhx/d7086a66b48bbdf6047950a1707801cc>
> Published: 2026-05-31 03:23:22+00:00

**Date:** 2026-05-30 • **Verified clean build:** 2.1.153 • **Broken builds:** 2.1.154–2.1.158 (clean 2.1.157-good / 2.1.158-bad bisect from another reporter; symptoms predate 2.1.158)

CC 2.1.154–2.1.158 corrupts the path that delivers tool results back to the
model. Execution is clean — commands run exactly once and disk state is
correct. The channel that returns results is what's broken: results arrive
empty-then-late in bursts, the tail of a parallel batch renders ~15×
duplicated, `Read`

sometimes returns phantom content (a 76-line file reported
as 5333 lines, with a row that wasn't in the file), and results flush out of
order.

Downstream: burns context + output tokens, invalidates prompt-cache hits, and provokes model-side amplification — busy-wait polling and fabricated results that drive retry cascades.

**Suspect (temporal, not maintainer-confirmed):** the 2.1.154 changelog entry
"Streaming tool execution is now always enabled, including when telemetry is
disabled or on Bedrock/Vertex/Foundry (previously behind a feature flag)" lines
up exactly with when the cross-platform cluster ignites.

The trick: append a label to a log file *before* echoing. The log is ground
truth of what executed (immune to the delivery channel); stdout is what the
delivery channel returned. Comparing the two cleanly separates "did the
command run?" from "did the result arrive intact?"

``` bash
#!/usr/bin/env bash
# /tmp/toolprobe/probe.sh
log=/tmp/toolprobe/log.txt
echo "$(date +%s.%N) $1" >> "$log"
echo "PROBE_OK label=$1 count=$(wc -l < "$log")"
```

Run N labels as a single parallel Bash batch, then compare per-label
execution count (lines in `log.txt`

) vs inline result blocks received.

The probe above was built *because* the 2.1.158 session corrupted tool-result
delivery mid-task. So this side is qualitative — what the original broken
session produced before the probe existed:

**Tail of a parallel Bash batch rendered ~15× duplicated** in the inline result blocks the model received.**Empty-then-burst arrival lagged ~1 turn**— multiple parallel calls returned blank, then their content flushed together later.including a fabricated duplicate row not in the source file.`Read`

of a 76-line file reported 5333 lines`wc -l`

on the same file from a sibling Bash call returned 76.- Cross-checking ground truth (a log file appended to from inside each command) confirmed
**every command executed exactly once.** Only the channel handing results back to the model was corrupted.

The reason there isn't a structured 2.1.158 probe run below: surviving a corrupted-delivery session well enough to drive a structured probe is itself hard — the corruption shreds the harness state you'd use to script it. The clean 2.1.153 run below was the first opportunity to instrument cleanly.

```
claude --version: 2.1.153 (Claude Code)
date: Sat May 30 19:46:13 CDT 2026
PROBE_OK label=R1 count=1
Call A → PROBE_OK label=A count=2
Call B → PROBE_OK label=B count=3
Call C → PROBE_OK label=C count=4
Call D → PROBE_OK label=D count=5
Call E → PROBE_OK label=E count=6
Call F → PROBE_OK label=F count=7
Call G → PROBE_OK label=G count=8
Call H → PROBE_OK label=H count=9
```

(Every label's `count=`

value was strictly monotonic 1..25 across all three
parallel batches — no two parallel calls reported the same count, no gaps,
no dupes.)

```
seq 1 500 > /tmp/toolprobe/big.txt
```

Read tool on `big.txt`

: reported 501 numbered rows. Rows 1..500 contained
the exact values "1".."500"; row 501 was empty (trailing-newline artifact —
`wc -l`

reports 500, matching). **No phantom rows, no fabrication.**

``` bash
$ wc -l /tmp/toolprobe/log.txt
25 /tmp/toolprobe/log.txt

$ awk '{print $2}' /tmp/toolprobe/log.txt | sort | uniq -c
      1 A   1 B   1 C   1 D   1 E   1 F   1 G   1 H
      1 I   1 J   1 K   1 L   1 M   1 N   1 O   1 P
      1 Q   1 R   1 S   1 T   1 U   1 V   1 W   1 X
      1 R1
```

| Label | exec count (log) | inline result blocks received |
|---|---|---|
| R1, A–X (25 total) | 1 each | 1 each |

**Distinct labels: 25. Max count any label reached: 1. Dupes: 0. Empties: 0. Cancels: 0.** Read fidelity exact.

n=1 — the bug is intermittent, so one clean pass on 2.1.153 isn't absolute proof of absence. But it converges with the changelog-timed hypothesis and with upstream's 2.1.157-good / 2.1.158-bad bisect.

— Linux/WSL primary.[#63797](https://github.com/anthropics/claude-code/issues/63797)**Reproduced fresh/short session after reboot**, which empirically rules out the naive 64KB-pipe-buffer theory and points at the streaming pipeline itself.— macOS, fullest forensics.[#63966](https://github.com/anthropics/claude-code/issues/63966)**88/88** Proves delivery lag, not execution loss. One`tool_use`

↔`tool_result`

paired in JSONL (0 orphans).`Read`

had ~15 intervening tool uses before its result.— bisect tripwire. Clean[#63935](https://github.com/anthropics/claude-code/issues/63935)**2.1.157-good / 2.1.158-bad**;`/clear`

does not fix; downgrade to 2.1.157 resolves the worst form. We target 2.1.153 not 2.1.157 because streaming-always-on dates to 2.1.154.— multi-minute buffering → model busy-waits with ~500 no-op poll commands.[#64077](https://github.com/anthropics/claude-code/issues/64077)— out-of-order results with stale batch data when a parallel batch partially fails.[#63859](https://github.com/anthropics/claude-code/issues/63859)/[#63538](https://github.com/anthropics/claude-code/issues/63538)/[#63884](https://github.com/anthropics/claude-code/issues/63884)/[#64065](https://github.com/anthropics/claude-code/issues/64065)— model-behavior facet: when a batch looks empty/cancelled, the model fabricates output (once even a fake user instruction).[#64076](https://github.com/anthropics/claude-code/issues/64076)

Year-long lineage of the same delivery-not-execution class: ** #13984** (2025-12) →

**(canonical precedent, closed not_planned) → the current 2.1.154+ cluster.**

[#36038](https://github.com/anthropics/claude-code/issues/36038)**Clean window:** Agent View requires ≥2.1.139; regression starts at 2.1.154.
So 2.1.153 keeps Agent View and predates the bug.

**Disable the auto-updater FIRST**(else the supervisor binary-watch re-bumps you). Add to`~/.claude/settings.json`

:There is

```
{ "env": { "DISABLE_AUTOUPDATER": "1" } }
```

**no**`autoUpdates`

settings key —`DISABLE_AUTOUPDATER`

is an env var inside`settings.json`

. Do**not** use`minimumVersion`

— it sets a floor that would block the downgrade.**Downgrade** with the built-in installer (no npm):

```
claude install 2.1.153
```

**Restart.**`claude respawn --all`

for background sessions. Verify`claude --version`

→ 2.1.153.

**Rollback:** `claude install 2.1.158`

+ remove `DISABLE_AUTOUPDATER`

.

This is non-trivial — Anthropic shipped Opus 4.8 in the same release that broke delivery. From the official CC changelog, here is everything 2.1.154–2.1.158 added that you lose by pinning to 2.1.153:

**Opus 4.8**—*"Opus 4.8 is here! Now defaults to high effort ·*Pinning means your top model is Opus 4.7.`/effort xhigh`

for your hardest tasks."**Dynamic workflows**—`/workflows`

orchestrates work across tens to hundreds of agents in the background.- Fast mode on Opus 4.8 at
*"a fraction of its previous cost: 2x the standard rate for 2.5x the speed."* - Lean system prompt as default for all models except Haiku, Sonnet, and Opus 4.7 and earlier.
`/simplify`

runs a cleanup-only review (reuse/simplification/efficiency) instead of full`/code-review --fix`

.`claude agents`

:`! <command>`

to run a shell command as a background session you can attach to / detach from.- Improved auto-mode classifier detection of bulk-repo data-exfil patterns.
- Claude Opus 4.8 support added to the
`/claude-api`

skill.

- Fix for Opus 4.8 thinking-block modification causing API errors.

- Plugins in
`.claude/skills/`

auto-load (no marketplace required). `claude plugin init <name>`

to scaffold a new plugin in`.claude/skills`

.`/plugin`

autocomplete for subcommands, installed plugin names, and marketplaces.`claude agents`

:`agent`

field in`settings.json`

honored for dispatched sessions,`--agent <name>`

to override.`EnterWorktree`

can switch between Claude-managed worktrees mid-session.- Several IDE / VSCode / WSL fixes (image paste on WSL,
`--resume`

background-subagent reporting, etc.).

- Auto mode on Bedrock/Vertex/Foundry for Opus 4.7 + 4.8 via
`CLAUDE_CODE_ENABLE_AUTO_MODE=1`

.

**Pin to 2.1.153** if reliable tool-result delivery matters more than Opus 4.8 + dynamic workflows for your work.**Stay on 2.1.158** if Opus 4.8 or dynamic workflows are load-bearing for you, and ride the mitigations checklist below. The bug is real on that path; you'd be managing it, not avoiding it.

**Redirect to file, then Read.**`cmd > /tmp/out 2>&1`

then read the file. Most-cited workaround (history of the same class: #27004, #63966).**Defeat block-buffering when piping.**`stdbuf -oL`

/`-o0`

,`unbuffer`

,`grep --line-buffered`

,`PYTHONUNBUFFERED=1`

, or a PTY.**Keep stdout small.**`BASH_MAX_OUTPUT_LENGTH`

.**Don't poll a pending result.** Polling can't pull a buffered result through; it only burns turns (#64077).**Don't fabricate to fill a gap.** Empty = "delivery lag, re-verify", never confabulate (this is what produces the model-side fabrication cascade).**Fewer, fatter tool calls.** Burst delivery hurts wide parallel fan-outs more than serial ones.**Operational palliatives:**`/compact`

, restart between tasks, move heavy work to a subagent (fresh subprocess often works when the main loop is wedged).

`CLAUDE_CODE_DISABLE_NONSTREAMING_FALLBACK=1`

— Anthropic's own env-var docs say the streaming→non-streaming fallback can "produce duplicate tool execution." Kills the duplicating fallback but not the underlying pipeline.`CLAUDE_CODE_ENABLE_FINE_GRAINED_TOOL_STREAMING=0`

— input-side only; doesn't address tool-result delivery.`/clear`

— does not fix (#63935).- "Wait it out" — execution finished; the channel won't unstick on its own.

- The "streaming-always-on @ 2.1.154 ↔ regression cluster" link is
**temporal/circumstantial**, not maintainer-confirmed. No maintainer reply on the cluster issues as of writing. - Pure 64KB-pipe-buffer exhaustion was
**empirically contradicted** as the cluster trigger (fresh/short sessions repro per #63797). Still explains older "worsens as session grows" reports like #36038. - Env vars seen only in third-party blog lists (
`CLAUDE_CODE_EAGER_FLUSH`

,`CLAUDE_STREAM_IDLE_TIMEOUT_MS`

, etc.) are**not** on the official env-vars page — don't rely on them unverified. - Latest CC at write-time =
**2.1.158**; its changelog has** no**entry addressing tool-result delivery/buffering.
