JetFlow: Breaking the Scaling Ceiling of Speculative Decoding with Parallel Tree Drafting
Researchers from Hao AI Lab introduced JetFlow, a speculative decoding framework that breaks the scaling ceiling of autoregressive LLMs by combining one-forward drafting efficiency with branch-wise ca…