Don't trust large context windows

wpnews.pro

cd /news/large-language-models/don-t-trust-large-context-windows · home › topics › large-language-models › article

[ARTICLE · art-26778] src=garrit.xyz ↗ pub=2026-06-14T06:07Z topic=large-language-models verified=true sentiment=↓ negative

Don't trust large context windows

A new analysis warns that large language model context windows degrade significantly beyond 100k tokens, making advertised sizes of 200k to 2M tokens misleading for practical use. Studies like RULER and Chroma's context rot report confirm effective context is a fraction of the advertised number, prompting developers to adopt strategies like session compaction and artifact-based handoffs to maintain performance.

read2 min views22 publishedJun 14, 2026

I recently watched a video that put a name on something I'd been feeling. The author splits an LLM's context window into two zones. There's the smart zone, where the model is sharp, and the dumb zone, where attention drops off and the model starts forgetting what you told it five minutes ago. The cutoff sits somewhere around 100k tokens. It doesn't matter how big the advertised context window is.

This matters because coding agents will happily walk you straight into the dumb zone. A modern agent burns through tokens fast. A few file reads, a long debug session, a sprawling test run, and you're at 100k before lunch. Meanwhile vendors keep advertising windows of 200k, 1M, even 2M, as if those numbers represented a usable working set. They don't. Studies like RULER and Chroma's report on context rot show that effective context is a fraction of the advertised number, and that performance degrades gradually as you fill the window.

Large context windows are mostly a marketing number. The architectures behind them work, but they paper over a problem the underlying attention mechanism doesn't really solve. The number on the box gets bigger every release. The usable part doesn't keep up.

Modern agents are getting smart about this. Tools like Claude Code now auto-compact: when the session gets long, the agent summarizes the history and starts fresh. That helps. But auto-compaction kicks in after you've already spent time in the dumb zone, and the summary is itself produced by a model that's already degraded. Better than nothing, but I'd rather avoid the situation altogether.

What I do is open a new session and pass it a spec I wrote myself. That's a much higher signal handoff than any automated summary, because I get to decide what matters going forward. It's the breadcrumb approach applied to agents. Leave an artifact that the next session, or the next person, can pick up cleanly.

You can take this further. Projects like obra/superpowers and mattpocock/skills structure entire agent workflows around small, named artifacts. PRDs, plans, skills, sub-agent handoffs. Each one is a way to keep the working session in the smart zone by deliberately moving information out of the session into something the next session can read.

So I treat my context window like a budget. I assume only the first chunk is really working for me, and everything I can move out of the live session and into a written artifact is one less thing for attention to fight over.

source & further reading

garrit.xyz — original article You should be using a meta harness for agents Pac-Man, but You're the Ghost

~/api · this article 200

$curl api.wpnews.pro/v1/news/don-t-trust-large-contex…

Read original on garrit.xyz → garrit.xyz/posts/2026-05-06-dont-trust-large-con…

mentioned entities

Claude Code

RULER

Chroma

obra/superpowers

mattpocock/skills

metadata

slugdon-t-trust-large-context-windows

topic#large-language-models

secondary4 topics

sentimentnegative

canonicalgarrit.xyz

navigation

← prevWhy One AI Model Is Not Enough f…

next →The agent economy added two rail…

── more in #large-language-models 4 stories · sorted by recency

runtimewire.com · 29 Jul · #large-language-models

Composio's Kimi K3 test finds a 6x token gap between agent harnesses

promptcube3.com · 29 Jul · #large-language-models

Claude Code: Why Local Context Matters for AI Infrastructure

grid.is · 29 Jul · #large-language-models

An agent with a spreadsheet engine beats one without

lesswrong.com · 29 Jul · #large-language-models

Value Generalisation 1: a Research and Deployment Program

── more on @claude code 3 stories trending now

wpnews · 16 Jul · #artificial-intelligence

Women entrepreneurs are less likely to leverage AI—but more likely to benefit from it

wpnews · 28 Jul · #large-language-models

How to Download and Run Kimi K3 Open Weights

wpnews · 28 Jul · #artificial-intelligence

How Claude Code and VS Code turned Anthropic from a safety lab into a developer phenomenon

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required