# SGLang adds DFlash to push Qwen 3.5 397B-A17B inference up to 4.3x faster

> Source: <https://runtimewire.com/article/sglang-dflash-qwen-397b-inference-throughput>
> Published: 2026-06-15 17:54:35+00:00

Jian Chen (@jianchen1799), Yesheng Liang and Zhijian Liu (@zhijianliu ) have pushed DFlash, Z Lab's block diffusion speculative decoding method, into SGLang through a collaboration with Modal and LMSYS, with the teams reporting up to 4.31x higher throughput over a non speculative baseline on Qwen 3.5 397B A17B. The June 15 release is not another model launch dressed up as infrastructure. It is a serving stack bet: if open weight frontier models keep getting larger, the bottleneck shifts from ...
