14:21
2026-06-25
news.ycombinator.com
large-language-models
I was curious why MTP affects PP TPS in llama.cpp. My PoC recovers it?
A developer investigating low prompt processing throughput with Multi-Token Prediction (MTP) in llama.cpp created a proof-of-concept that recovers the overhead by processing only the output row of theβ¦