Microsoft Taught a Reasoning Model to Compress Its Own Thoughts Mid-Generation.

Microsoft researchers developed Memento, a method that trains large language models to compress their own reasoning steps mid-generation by evicting blocks from the KV cache and replacing them with compact summaries, reducing computational costs.

Memento Microsoft, April 2026 trains LLMs to evict reasoning blocks from the KV cache and replace them with compact summaries, cutting… Continue reading on Towards AI »