cd/entity/DFlash· home› entities› DFlash

grep -l @dflash /news/*.json | wc -l → 12

DFlash

mentions 12 type Organization feed RSS

// recent coverage 12 mentions

04:00

2026-07-09

arxiv.org

artificial-intelligence

DeLS-Spec: Decoupled Long-Short Contexts for Parallel Speculative Drafting

Researchers propose DeLS-Spec, a speculative decoding method that decouples long- and short-context experts to accelerate LLM inference. By adding a lightweight local head to the existing DFlash model…

20:16

2026-06-27

github.com

machine-learning

GitHub DeepSeek-AI/DeepSpec

DeepSeek-AI released DeepSpec, an open-source codebase for training and evaluating draft models for speculative decoding, supporting three draft model algorithms (DSpark, DFlash, Eagle3) and requiring…

16:59

2026-06-27

marktechpost.com

large-language-models

DeepSeek Releases DSpark, a Speculative Decoding Framework That Accelerates DeepSeek-V4 Per-User Generation 60–85% Over MTP-1

DeepSeek open-sourced DSpark, a speculative decoding framework that accelerates per-user generation for DeepSeek-V4 by 57–85% over the MTP-1 baseline. The framework pairs a parallel draft backbone wit…

13:27

2026-06-27

byteiota.com

large-language-models

DeepSeek DSpark Goes Live with 80% Inference Speed Gains

DeepSeek released DSpark, a speculative decoding framework now live in its DeepSeek-V4 Flash and Pro production API, delivering 51 to 400 percent throughput gains and up to 80 percent latency reductio…

15:00

2026-06-23

developer.nvidia.com

large-language-models

Boost Inference Performance up to 15x on NVIDIA Blackwell Using DFlash Speculative Decoding

NVIDIA announced that DFlash speculative decoding boosts inference performance on Blackwell GPUs by up to 15x for gpt-oss-120b and nearly doubles interactivity for Llama 3.1 8B compared to EAGLE-3. Th…

09:28

2026-06-22

blog.doubleword.ai

large-language-models

Anatomy of a Diffusion Language Model

Diffusion Language Models (dLLMs) offer a faster alternative to autoregressive LLMs by generating multiple tokens in parallel, but face consistency challenges. Recent innovations like DFlash, Diffusio…

00:00

2026-06-22

fergusfinn.com

large-language-models

Adaptive speculative decoding: picking draft lengths at runtime

Researchers have developed adaptive speculative decoding, a method that dynamically selects draft lengths at runtime to optimize token generation efficiency in large language models. The approach addr…

00:17

2026-06-20

modal.com

large-language-models

Speculation Is All You Need

Modal Labs released state-of-the-art DFlash speculators for Qwen 3.5 and Qwen 3.6 models on Hugging Face, achieving 5-20% additional speedups and enabling Qwen 3.5 122B-A10B to run at over 1000 tok/s …

17:54

2026-06-15

runtimewire.com

large-language-models

SGLang adds DFlash to push Qwen 3.5 397B-A17B inference up to 4.3x faster

Jian Chen, Yesheng Liang, and Zhijian Liu integrated Z Lab's DFlash block diffusion speculative decoding method into SGLang, collaborating with Modal and LMSYS. The team reports up to 4.31x higher thr…

23:16

2026-06-12

letsdatascience.com

large-language-models

Xiaomi MiMo Hits 1,000 Tokens-Per-Second Inference

Xiaomi's MiMo-V2.5-Pro-UltraSpeed, a 1.02-trillion-parameter MoE model, achieved 1,000 tokens per second inference on standard cloud GPUs using FP4 quantization, DFlash speculative decoding, and TileR…

00:00

2026-06-08

fergusfinn.com

artificial-intelligence

The economics of speculative decoding

Speculative decoding, a lossless inference optimisation that predicts future tokens to reduce latency, faces new economic constraints as modern mixture-of-experts (MoE) architectures replace dense tra…

03:11

2026-05-20

developers.googleblog.com

large-language-models

Supercharging LLM inference on Google TPUs: Achieving 3X speedups with diffusion-style speculative decoding

Researchers at UC San Diego, led by Hao Zhang, have successfully implemented a diffusion-style speculative decoding method called DFlash on Google TPUs, achieving an average 3.13x increase in tokens p…

// co-occurs with top 8 entities

DSpark 4 Hugging Face 3 SGLang 3 Z Lab 3 DeepSeek 3 DeepSpec 3 Eagle3 3 Qwen 2