04:00
2026-06-29
arxiv.org
large-language-models
EntMTP: Accelerating LLM Inference with Entropy Guided Multi Token Prediction
Researchers propose EntMTP, a training-free scheduler that dynamically adjusts multi-token prediction depth based on local entropy, achieving up to 1.36x speedup over Medusa baselines in LLM inferenceβ¦