04:00
2026-06-25
arxiv.org
large-language-models
Dustin: Draft-Augmented Sparse Verification for Efficient Long-Context Generation with Speculative Decoding
Researchers propose Dustin, a sparse verification framework for long-context speculative decoding in LLMs, achieving a 27.85x speedup in self-attention and 9.17x end-to-end decoding speedup at 32k seqโฆ