03:11
2026-05-20
developers.googleblog.com
large-language-models
Supercharging LLM inference on Google TPUs: Achieving 3X speedups with diffusion-style speculative decoding
Researchers at UC San Diego, led by Hao Zhang, have successfully implemented a diffusion-style speculative decoding method called DFlash on Google TPUs, achieving an average 3.13x increase in tokens pโฆ