{"slug": "context-window-packing-agent-patterns-catalog", "title": "Context Window Packing – Agent Patterns Catalog", "summary": "An agent's context window has grown to exceed the model's maximum, causing calls to fail when content is naively concatenated or truncated. The team must implement a deliberate packing policy that scores items by recency, relevance, and pinned-status, then fits a budgeted subset while replacing the rest with summaries. This ensures the window stops overflowing and critical state remains visible across turns.", "body_md": "# Context Window Packing\n\n*also known as* Context Compression, Token Budget Management, Fit in Context, Token Cost Reduction\n\nChoose what fits in the context window each turn given a fixed token budget.\n\nThis pattern helps complete certain larger patterns —\n\n- used-by\n[Reasoning Trace Carry-Forward★](/patterns/reasoning-trace-carry-forward/)— For reasoning models that emit a separate reasoning trace, preserve that trace in context across the same logical task episode (across tool-call/result turns) but drop it at user-turn boundaries. - used-by\n[Todo-List-Driven Autonomous Agent★](/patterns/todo-list-driven-agent/)— Have the agent author a plan file (e.g. todo.md) early in the run, tick items as it completes them, and re-inject the remaining plan into context; the file is durable plan and working memory.\n\n## Context\n\nAn agent's available context for the next model call — the system prompt, conversation history, retrieved chunks, tool definitions, current state, and any other information the model needs — has grown to the point where it exceeds the model's maximum context window. The team has to decide what goes in and what stays out for every single call.\n\n## Problem\n\n*Naively concatenating everything overflows the window and the call fails. Naively truncating from the start or the end drops information that may be critical (the original task, the most recent tool result, the system prompt itself). A first-fit packing strategy leaves the model with a different subset on every call, which makes behaviour unpredictable. The team needs a deliberate policy for what is preserved, what is summarised, what is retrieved on demand, and what is dropped — and that policy has to be applied consistently across calls.*\n\n## Forces\n\n- What to drop is task-dependent.\n- Compression has its own LLM cost.\n- Reserved budget for the response itself.\n\n## Example\n\nA long-running support agent has a 200k window and a thirty-turn conversation full of tool outputs, two 80-page attached PDFs, and the system charter. Naive concatenation overflows; truncating from the back loses the original ticket; truncating from the front loses the latest turn. The team builds a Context-Window Packing step: each turn it scores items by recency, relevance, and pinned-status, then fits a budgeted subset, replacing the rest with summaries. The window stops overflowing and critical state stays visible.\n\n## Diagram\n\n## Solution\n\n*Therefore:*\n\n**Define a packing policy. Reserve N tokens for system + tools + response. Allocate the rest across history (compressed), retrieved chunks (top-k after rerank), and current state. Use eviction (drop oldest), summarisation (compress), or selection (relevance-rank) policies. Audit token counts before each call.**\n\nWhat this pattern forbids. Total tokens passed to the model must not exceed the window minus the reserved response budget.\n\nThe smaller patterns that complete this one —\n\n- uses\n[Episodic Summaries★★](/patterns/episodic-summaries/)— Compress past episodes into summaries that preserve gist while shedding token cost.\n\nAnd the patterns that stand alongside it, or against it —\n\n- complements\n[Dynamic Scaffolding★](/patterns/dynamic-scaffolding/)— Inject task-specific scaffolding (examples, hints, schemas) into the prompt only when the task type warrants it. - alternative-to\n[MemGPT-Style Paging★](/patterns/memgpt-paging/)— Treat the LLM context window as RAM and external storage as disk, with the model issuing tool calls to page memory in and out. - alternative-to\n[Salience Attention Mechanism★](/patterns/salience-attention-mechanism/)— Score every candidate memory item with a weighted salience function so each tick attends to a small, relevant top-k subset rather than re-reading all memory. - complements\n[Self-Archaeology·](/patterns/self-archaeology/)— Synthesize the agent's past thought history into time-layered trajectory notes so it can articulate how its understanding evolved without recomputing the narrative each time. - complements\n[Tool Search Lazy Loading★](/patterns/tool-search-lazy-loading/)— Defer loading tool schemas into the context window until a search step shows they are needed. - complements\n[Sleep-Time Compute·](/patterns/sleep-time-compute/)— During idle or downtime, run the model offline against the user's standing context to pre-compute dense summaries and likely future answers, so test-time latency and cost drop when the user actually asks. - complements\n[Context Window Dumb-Zone Cap★](/patterns/context-window-dumb-zone/)— Hold context-window utilization below a working threshold (~40%) to keep the model out of the 'dumb zone' where it begins ignoring earlier instructions and hallucinating. - complements\n[Landmark Attention·](/patterns/landmark-attention/)— Long-context attention mechanism placing sparse landmark tokens across very long inputs so the model jumps directly to relevant sections via landmark lookup rather than scanning linearly. - complements\n[Information Chunking for Agent Memory★★](/patterns/information-chunking-memory/)— Structure inputs into digestible topical segments (chunks) before feeding to short-term memory rather than throwing the full input at the model; reduces overload and increases accuracy (~40% improvement observed in customer-service deployment). - alternative-to\n[Lost in the Middle (Positional Bias)✕](/patterns/lost-in-the-middle/)— LLM accuracy on retrieving information from long contexts drops sharply when relevant content sits in the middle of the prompt rather than at the start or end.\n\n## Neighbourhood\n\nClick any neighbour to follow the language. Scroll to zoom, drag to pan.", "url": "https://wpnews.pro/news/context-window-packing-agent-patterns-catalog", "canonical_source": "https://www.agentpatternscatalog.org/patterns/context-window-packing/", "published_at": "2026-05-27 17:26:06+00:00", "updated_at": "2026-05-27 17:46:28.494080+00:00", "lang": "en", "topics": ["large-language-models", "ai-agents", "natural-language-processing", "ai-infrastructure", "ai-tools"], "entities": [], "alternates": {"html": "https://wpnews.pro/news/context-window-packing-agent-patterns-catalog", "markdown": "https://wpnews.pro/news/context-window-packing-agent-patterns-catalog.md", "text": "https://wpnews.pro/news/context-window-packing-agent-patterns-catalog.txt", "jsonld": "https://wpnews.pro/news/context-window-packing-agent-patterns-catalog.jsonld"}}