05:00
2026-06-22
dev.to
large-language-models
Sparse KV Caches Cut Attention Scaling
MiniMax introduced sparse key-value caches that reduce attention scaling from quadratic to near-linear, enabling practical multi-hundred-kilobyte context windows on a single GPU. The method cuts per-tโฆ