{"slug": "dustin-draft-augmented-sparse-verification-for-efficient-long-context-generation", "title": "Dustin: Draft-Augmented Sparse Verification for Efficient Long-Context Generation with Speculative Decoding", "summary": "Researchers propose Dustin, a sparse verification framework for long-context speculative decoding in LLMs, achieving a 27.85x speedup in self-attention and 9.17x end-to-end decoding speedup at 32k sequence length with negligible accuracy loss.", "body_md": "arXiv:2606.24957v1 Announce Type: new\nAbstract: While speculative decoding improves inference throughput for multi-batch long-context Large Language Models (LLMs), its efficiency is often limited by a verification bottleneck where Key-Value (KV) cache loading dominates latency. Existing compression methods fail in this regime: static eviction incurs accuracy loss due to saliency shift, while dynamic selection introduces prohibitive computational overhead during the verification path. We propose Dustin, a sparse verification framework designed for long-context speculative decoding. Dustin integrates lookahead signals from the draft model with historical attention from the target model to identify critical tokens with high fidelity across multi-step verification windows. To reduce recomputation latency, this approach further employs a sparse estimation scheme that restricts importance scoring to a minimal subset of attention heads. Evaluations on PG-19 and LongBench with Qwen2.5-72B demonstrate that Dustin achieves a 27.85x speedup in self-attention and a 9.17x end-to-end decoding speedup at a 32k sequence length, all with negligible accuracy degradation.", "url": "https://wpnews.pro/news/dustin-draft-augmented-sparse-verification-for-efficient-long-context-generation", "canonical_source": "https://arxiv.org/abs/2606.24957", "published_at": "2026-06-25 04:00:00+00:00", "updated_at": "2026-06-25 04:14:58.763120+00:00", "lang": "en", "topics": ["large-language-models", "artificial-intelligence", "machine-learning", "ai-research"], "entities": ["Dustin", "Qwen2.5-72B", "PG-19", "LongBench"], "alternates": {"html": "https://wpnews.pro/news/dustin-draft-augmented-sparse-verification-for-efficient-long-context-generation", "markdown": "https://wpnews.pro/news/dustin-draft-augmented-sparse-verification-for-efficient-long-context-generation.md", "text": "https://wpnews.pro/news/dustin-draft-augmented-sparse-verification-for-efficient-long-context-generation.txt", "jsonld": "https://wpnews.pro/news/dustin-draft-augmented-sparse-verification-for-efficient-long-context-generation.jsonld"}}