16:00
2026-05-27
dev.to
large-language-models
Why your quantized LLM loses its MTP heads and how to keep them
A developer discovered that standard quantization pipelines for large language models silently discard multi-token prediction (MTP) heads, causing speculative decoding speedups to vanish despite the bโฆ