Context Recycling for Long-Horizon LLM Inference

Researchers introduced ContextForge, a system that recycles context for long-horizon LLM inference by combining structured query generation, external memory retrieval, and controlled synthesis. In a 15-turn conversational benchmark, ContextForge reduced token consumption while maintaining response accuracy compared to a baseline agent. The approach extends LLM capabilities for long tasks without requiring larger context windows or model retraining.

arXiv:2606.26105v1 Announce Type: new Abstract: Large language models LLMs exhibit strong capabilities in short-context reasoning but degrade in performance over long conversational horizons due to context window limitations and inefficient token usage. We introduce ContextForge, a system for context recycling that maintains task-relevant information across turns by combining structured query generation, external memory retrieval, and controlled synthesis. The system enables efficient reuse of prior computation without relying on full context replay, reducing token overhead while preserving answer quality. We evaluate ContextForge using a 15-turn conversational benchmark that tests multi-turn reasoning, back-references, and domain shifts across structured healthcare queries. Compared to a baseline agent using identical underlying models, ContextForge demonstrates improved consistency and reduced token consumption, while maintaining comparable response accuracy. These results suggest that context recycling provides a practical approach for extending LLM capabilities in long-horizon tasks without requiring larger context windows or model retraining. Code and evaluation artifacts are available at https://github.com/Betanu701/ContextForge.