@LazyAttention

mentions 1 type Organization feed RSS

04:00

2026-06-04

arxiv.org

large-language-models

LazyAttention: Efficient Retrieval-Augmented Generation with Deferred Positional Encoding

Researchers have developed LazyAttention, a new attention mechanism that enables zero-copy, position-agnostic key-value cache reuse for large language models by deferring positional encoding to within…

// co-occurs with top 1 entities

Block-Attention 1