04:00
2026-06-04
arxiv.org
large-language-models
LazyAttention: Efficient Retrieval-Augmented Generation with Deferred Positional Encoding
Researchers have developed LazyAttention, a new attention mechanism that enables zero-copy, position-agnostic key-value cache reuse for large language models by deferring positional encoding to withinβ¦