14:00
2026-06-30
blog.r-lopes.com
large-language-models
Thematic Brief β How the KV cache accelerates LLM inference on GPUs
The KV cache accelerates LLM inference on GPUs by storing prior token key/value projections instead of recomputing them, reducing per-step attention cost from quadratic to linear. Decode is memory-banβ¦