{"slug": "the-context-window-an-llm-s-short-term-memory-explained", "title": "The Context Window: an LLM's Short-Term Memory, Explained", "summary": "A developer explains that large language models (LLMs) are stateless and their 'memory' is limited to a fixed context window. When the window fills, the oldest messages are dropped and cannot be recalled. The post demonstrates this with a visual demo and discusses implications for cost, performance, and prompt engineering.", "body_md": "A chatbot feels like it remembers you. It doesn't — it's stateless. Everything it \"knows\" is just text resent each call, up to a fixed limit: the context window. When the box fills, the oldest messages fall off the edge and are genuinely gone.\n\n🪟 **Watch tokens fall off:** [https://dev48v.infy.uk/ai/days/day8-context-window.html](https://dev48v.infy.uk/ai/days/day8-context-window.html)\n\n```\nreply = model(allMessagesSoFar);  // the app resends the whole history every turn\n```\n\n\"Memory\" is just text you keep pasting back in.\n\nPrompt + conversation + pasted docs + the reply must all fit inside a fixed number of tokens. When the chat grows past it, the oldest messages get dropped — in the demo, faded messages have scrolled OUT and the model literally can't see them. Ask about something dropped and it truly has no idea.\n\nYou're billed per token in the window, every call. Pasting a whole book each turn is slow and expensive — so you don't just CAN'T fit unlimited text, you don't WANT to.\n\nEven within the limit, models attend best to the START and END; facts buried in the middle of a huge context can be overlooked. Bigger isn't automatically better.\n\nSummarise old turns + keep recent ones verbatim + use RAG to fetch only the relevant chunks instead of pasting everything. Understanding the window explains chatbot \"amnesia\" and most prompt-engineering tactics.", "url": "https://wpnews.pro/news/the-context-window-an-llm-s-short-term-memory-explained", "canonical_source": "https://dev.to/dev48v/the-context-window-an-llms-short-term-memory-explained-c8c", "published_at": "2026-06-20 07:05:43+00:00", "updated_at": "2026-06-20 07:37:24.544554+00:00", "lang": "en", "topics": ["large-language-models", "artificial-intelligence", "natural-language-processing", "ai-products", "developer-tools"], "entities": [], "alternates": {"html": "https://wpnews.pro/news/the-context-window-an-llm-s-short-term-memory-explained", "markdown": "https://wpnews.pro/news/the-context-window-an-llm-s-short-term-memory-explained.md", "text": "https://wpnews.pro/news/the-context-window-an-llm-s-short-term-memory-explained.txt", "jsonld": "https://wpnews.pro/news/the-context-window-an-llm-s-short-term-memory-explained.jsonld"}}