# The Context Window: an LLM's Short-Term Memory, Explained

> Source: <https://dev.to/dev48v/the-context-window-an-llms-short-term-memory-explained-c8c>
> Published: 2026-06-20 07:05:43+00:00

A chatbot feels like it remembers you. It doesn't — it's stateless. Everything it "knows" is just text resent each call, up to a fixed limit: the context window. When the box fills, the oldest messages fall off the edge and are genuinely gone.

🪟 **Watch tokens fall off:** [https://dev48v.infy.uk/ai/days/day8-context-window.html](https://dev48v.infy.uk/ai/days/day8-context-window.html)

```
reply = model(allMessagesSoFar);  // the app resends the whole history every turn
```

"Memory" is just text you keep pasting back in.

Prompt + conversation + pasted docs + the reply must all fit inside a fixed number of tokens. When the chat grows past it, the oldest messages get dropped — in the demo, faded messages have scrolled OUT and the model literally can't see them. Ask about something dropped and it truly has no idea.

You're billed per token in the window, every call. Pasting a whole book each turn is slow and expensive — so you don't just CAN'T fit unlimited text, you don't WANT to.

Even within the limit, models attend best to the START and END; facts buried in the middle of a huge context can be overlooked. Bigger isn't automatically better.

Summarise old turns + keep recent ones verbatim + use RAG to fetch only the relevant chunks instead of pasting everything. Understanding the window explains chatbot "amnesia" and most prompt-engineering tactics.
