07:38
2026-06-16
pub.towardsai.net
large-language-models
Microsoft Taught a Reasoning Model to Compress Its Own Thoughts Mid-Generation.
Microsoft researchers developed Memento, a method that trains large language models to compress their own reasoning steps mid-generation by evicting blocks from the KV cache and replacing them with coโฆ