17:54
2026-06-15
runtimewire.com
large-language-models
SGLang adds DFlash to push Qwen 3.5 397B-A17B inference up to 4.3x faster
Jian Chen, Yesheng Liang, and Zhijian Liu integrated Z Lab's DFlash block diffusion speculative decoding method into SGLang, collaborating with Modal and LMSYS. The team reports up to 4.31x higher thrβ¦