The Context Window: an LLM's Short-Term Memory, Explained

wpnews.pro

cd /news/large-language-models/the-context-window-an-llm-s-short-te… · home › topics › large-language-models › article

[ARTICLE · art-34672] src=dev.to ↗ pub=2026-06-20T07:05Z topic=large-language-models verified=true sentiment=· neutral

The Context Window: an LLM's Short-Term Memory, Explained

A developer explains that large language models (LLMs) are stateless and their 'memory' is limited to a fixed context window. When the window fills, the oldest messages are dropped and cannot be recalled. The post demonstrates this with a visual demo and discusses implications for cost, performance, and prompt engineering.

read1 min views1 publishedJun 20, 2026

A chatbot feels like it remembers you. It doesn't — it's stateless. Everything it "knows" is just text resent each call, up to a fixed limit: the context window. When the box fills, the oldest messages fall off the edge and are genuinely gone.

🪟 Watch tokens fall off: https://dev48v.infy.uk/ai/days/day8-context-window.html

reply = model(allMessagesSoFar);  // the app resends the whole history every turn

"Memory" is just text you keep pasting back in.

Prompt + conversation + pasted docs + the reply must all fit inside a fixed number of tokens. When the chat grows past it, the oldest messages get dropped — in the demo, faded messages have scrolled OUT and the model literally can't see them. Ask about something dropped and it truly has no idea.

You're billed per token in the window, every call. Pasting a whole book each turn is slow and expensive — so you don't just CAN'T fit unlimited text, you don't WANT to.

Even within the limit, models attend best to the START and END; facts buried in the middle of a huge context can be overlooked. Bigger isn't automatically better.

Summarise old turns + keep recent ones verbatim + use RAG to fetch only the relevant chunks instead of pasting everything. Understanding the window explains chatbot "amnesia" and most prompt-engineering tactics.

source & further reading

dev.to — original article Your Agent Demo Works. Your Agent Doesn't. AI Agents Explained: the Thought-Action-Observation Loop Temperature and Sampling: the LLM Creativity Dial

── more in #large-language-models 4 stories · sorted by recency

arxiv.org · 20 Jun · #large-language-models

GitHub Copilot and Dev Productivity: An Observational Dose-Response Analysis

dev.to · 20 Jun · #large-language-models

7 AI Models Are Quietly Running Your Workflow. Do You Know Which One Should Be?

dev.to · 20 Jun · #large-language-models

I Ran Claude Code on Every New Claude Model. Here's What Actually Ships.

dev.to · 20 Jun · #large-language-models

Treat prompt libraries as first-class deliverables for reliable AI code assistance

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required