H800 GPU

mentions 1 type Person feed RSS

// recent coverage 1 mentions

05:00

2026-06-22

dev.to

large-language-models

Sparse KV Caches Cut Attention Scaling

MiniMax introduced sparse key-value caches that reduce attention scaling from quadratic to near-linear, enabling practical multi-hundred-kilobyte context windows on a single GPU. The method cuts per-t…

// co-occurs with top 3 entities

MiniMax 1 Grouped Query Attention 1 MSA 1