The Readout Shortcut: Positional Number Copying Dominates Arithmetic CoT Readout in Small Language Models

wpnews.pro

cd /news/large-language-models/the-readout-shortcut-positional-numb… · home › topics › large-language-models › article

[ARTICLE · art-13543] src=arxiv.org ↗ pub=2026-05-25T04:00Z topic=large-language-models verified=true sentiment=· neutral

The Readout Shortcut: Positional Number Copying Dominates Arithmetic CoT Readout in Small Language Models

A new study of small language models reveals that chain-of-thought prompting for arithmetic relies on a positional shortcut: the model copies whichever number appears last before the answer delimiter, regardless of the logical reasoning steps. This copy channel accounts for 89-92% of each model's accuracy ceiling on GSM8K, and replacing the trailing number with a wrong value collapses performance even when correct intermediate steps remain. The findings indicate that step-level faithfulness evaluations may conflate positional number transport with genuine computation, posing a failure mode for chain-of-thought-based oversight.

read1 min views16 publishedMay 25, 2026

arXiv:2605.22870v1 Announce Type: new Abstract: Chain-of-thought (CoT) prompting is necessary for arithmetic in small language models, yet shuffling its steps preserves most performance. What does CoT contribute if not logical sequencing? In three 1-3B instruction-tuned LMs on GSM8K, we isolate the answer-readout stage via prefix completion and identify a positional shortcut: the model copies whichever number occupies the trailing position before the answer delimiter, regardless of intermediate reasoning. Gold-answer presence accounts for 54-92 pp of accuracy (89-92% of each model's teacher-forcing ceiling); even on incorrect items, the final answer matches the last CoT number 95-96% of the time. The copy channel takes precedence over retained-context completion: replacing the trailing number with a wrong value collapses accuracy to near-zero despite correct intermediates, yet removing it recovers 5-32 pp above that floor--even single-step arithmetic the model can otherwise perform is suppressed when a copyable number is present. Qwen and Llama copy novel distractors 87-95% of the time; Gemma gates selectively. Head-level ablation implicates architecture-specific head sets; the effect replicates on GSM-Symbolic. On non-arithmetic BBH tasks, shuffle retention drops sharply; at 7-8B, content-selective gating emerges. Step-level faithfulness evaluations risk conflating positional answer transport with genuine computation--a failure mode for CoT-based oversight.

source & further reading

arxiv.org — original article

~/api · this article 200

$curl api.wpnews.pro/v1/news/the-readout-shortcut-pos…

Read original on arxiv.org → arxiv.org/abs/2605.22870

mentioned entities

GSM8K

Qwen

Llama

Gemma

BBH

GSM-Symbolic

metadata

slugthe-readout-shortcut-positional-number-copying-dominates-arithmetic-cot-readout

topic#large-language-models

secondary3 topics

sentimentneutral

canonicalarxiv.org

navigation

← prevThe Eternal Sloptember

next →Samsung memory workers call off …

── more in #large-language-models 4 stories · sorted by recency

dev.to · 9 Jul · #large-language-models

How to Build a Profitable Micro-SaaS in 2026 Using AI (A Developer's Guide)

lesswrong.com · 9 Jul · #large-language-models

Your Prompt-Injection Defense Metric Might Be Lying to You

lesswrong.com · 9 Jul · #large-language-models

Natural Language Autoencoders are summarizers, but do they have to be?

research.ibm.com · 9 Jul · #large-language-models

CoFrGeNets replace the ‘bones’ of transformer-based models

── more on @gsm8k 3 stories trending now

wpnews · 27 May · #artificial-intelligence

How I Run Two Claude Accounts as One

wpnews · 8 Jul · #artificial-intelligence

Anthropic's "J-lens" reveals workspace in Claude mirrors theory of consciousness

wpnews · 30 May · #ai-safety

Nightcord Security Analysis Report - Threat Investigation

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required