Fast-dLLM++: Fr\'{e}chet Profile Decoding for Faster Diffusion LLM Inference

wpnews.pro

cd /news/large-language-models/fast-dllm-fr-e-chet-profile-decoding… · home › topics › large-language-models › article

[ARTICLE · art-19935] src=arxiv.org ↗ pub=2026-06-03T04:00Z topic=large-language-models verified=true sentiment=↑ positive

Fast-dLLM++: Fr\'{e}chet Profile Decoding for Faster Diffusion LLM Inference

Researchers have developed Fast-dLLM++, a training-free extension to diffusion large language models that accelerates inference by selecting parallel token commit sets based on the full sorted confidence profile rather than a single worst-case confidence. The method, which introduces Fréchet profile decoding, recovers the previous Fast-dLLM rule in equal-confidence cases and adds a provable heterogeneity bonus when tokens have uneven confidences. Experiments with the LLaDA-8B model on GSM8K, MATH, HumanEval, and MBPP benchmarks demonstrate up to 37% higher throughput at comparable accuracy, establishing a new accuracy–throughput frontier for diffusion LLM inference.

read1 min views17 publishedJun 3, 2026

arXiv:2606.02955v1 Announce Type: new Abstract: Diffusion large language models promise parallel token generation, yet inference remains bottlenecked by deciding which masked tokens can be safely committed together. Fast-dLLM addressed this with KV caching and confidence-guided parallel decoding, but its decoding theory uses a homogeneous high-confidence assumption that effectively reduces each candidate set to its weakest selected token. We argue that this leaves speed on the table because real decoding steps exhibit heterogeneous confidence profiles. We propose \textbf{Fast-dLLM++}, a training-free extension that introduces \emph{Fr'{e}chet profile decoding}: selecting parallel commit sets from the full sorted confidence profile rather than a single worst-case confidence. The resulting rule is a heterogeneous-confidence generalization of Fast-dLLM's factor selector and it recovers the previous rule exactly in the equal-confidence case and adds a provable \emph{heterogeneity bonus} when the selected tokens have uneven confidences. Fast-dLLM++ leaves the model, diffusion process, and cache implementation entirely unchanged, making it a drop-in replacement for existing Fast-dLLM decoding. Experiments on GSM8K, MATH, HumanEval, and MBPP with the LLaDA-8B model show that the theoretical improvement translates directly into empirical gains: profile-aware selection improves the accuracy--throughput frontier by exploiting safe parallelism that weakest-token rules miss, achieving up to 37% higher throughput at comparable accuracy. Our anonymous code release is at https://github.com/Ringo-Star/FastdLLM_plusplus.

source & further reading

arxiv.org — original article

~/api · this article 200

$curl api.wpnews.pro/v1/news/fast-dllm-fr-e-chet-prof…

Read original on arxiv.org → arxiv.org/abs/2606.02955

mentioned entities

Fast-dLLM

Fast-dLLM++

LLaDA-8B

GSM8K

MATH

HumanEval

MBPP

metadata

slugfast-dllm-fr-e-chet-profile-decoding-for-faster-diffusion-llm-inference

topic#large-language-models

secondary4 topics

sentimentpositive

canonicalarxiv.org

navigation

← prevAI Agent Deployment Architecture…

next →Achei interessante, talvez você …

── more in #large-language-models 4 stories · sorted by recency

machinebrief.com · 21 Jul · #large-language-models

PPL-Factory: Task-Aware and Budget-Aware Data Selection from Language Modeling to Reasoning

arxiv.org · 13 Jul · #large-language-models

KV-PRM: Efficient Process Reward Modeling via KV-Cache Transfer for Multi-Agent Test-Time Scaling

github.com · 10 Jul · #large-language-models

TinyToT – Tree of Thoughts Inference Server

arxiv.org · 7 Jul · #large-language-models

TACG: Trajectory-Aware Commit Gating for Diffusion Language Model Decoding

── more on @fast-dllm 3 stories trending now

wpnews · 30 May · #ai-safety

Nightcord Security Analysis Report - Threat Investigation

wpnews · 26 May · #ai-agents

Think, Durable Objects, and the Real Shape of AI Applications

wpnews · 8 Jul · #ai-tools

What's the Future of Clay?

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required