SGLang adds DFlash to push Qwen 3.5 397B-A17B inference up to 4.3x faster

Jian Chen, Yesheng Liang, and Zhijian Liu integrated Z Lab's DFlash block diffusion speculative decoding method into SGLang, collaborating with Modal and LMSYS. The team reports up to 4.31x higher throughput on Qwen 3.5 397B A17B inference compared to a non-speculative baseline, marking a significant infrastructure advance for large model serving.

Jian Chen @jianchen1799 , Yesheng Liang and Zhijian Liu @zhijianliu have pushed DFlash, Z Lab's block diffusion speculative decoding method, into SGLang through a collaboration with Modal and LMSYS, with the teams reporting up to 4.31x higher throughput over a non speculative baseline on Qwen 3.5 397B A17B. The June 15 release is not another model launch dressed up as infrastructure. It is a serving stack bet: if open weight frontier models keep getting larger, the bottleneck shifts from ...