01:20
2026-06-06
arxiv.org
machine-learning
Unlocking Non-Uniform KV Cache for Efficient Multi-Turn LLM Serving
Researchers introduced Tangram, a serving system that enables non-uniform Key-Value cache compression for multi-turn large language model inference. The system uses deterministic budget allocation, heβ¦