cd /news/large-language-models/jetspec · home topics large-language-models article
[ARTICLE · art-38935] src=haoailab.com ↗ pub= topic=large-language-models verified=true sentiment=↑ positive

JetSpec

JetSpec, a new speculative decoding method, trains a causal parallel draft head over fused hidden states from a frozen target model, enabling lossless verification of candidate trees in one forward pass. On Qwen3-8B with budget 256, JetSpec achieves up to 9.64x speedup on MATH-500 and 4.58x on open-ended chat, translating to around 1000 tokens per second throughput on a single B200 GPU.

read1 min views2 publishedJun 22, 2026
JetSpec
Image: Haoailab (auto-discovered)

TL;DR: Speculative decoding hits a scaling ceiling: a larger draft budget helps only while acceptance stays high and drafting stays cheap. Prior draft heads face a dilemma: autoregressive drafters condition on each path but pay with tree depth, while block-diffusion drafters draft in one pass but score branches independently, creating plausible yet mutually inconsistent trees.

JetSpec trains a

causal parallel draft head over fused hidden states from a frozen target model, so candidate-tree scores follow the target’s own autoregressive factorization. The frozen target then verifies the full tree in one forward pass, losslessly. On Qwen3-8B, greedy decoding with budget 256, JetSpec reaches 9.64x on MATH-500 and 4.58x on open-ended chat, and these gains carry into real single-stream serving on JetSpec’s own engine with an average of around 1000 TPS throughput on MATH-500 using a single B200 GPU.

── more in #large-language-models 4 stories · sorted by recency
── more on @jetspec 3 stories trending now
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/jetspec] indexed:0 read:1min 2026-06-22 ·