Minimax M2.x

mentions 1 type Person feed RSS

// recent coverage 1 mentions

14:21

2026-06-25

news.ycombinator.com

large-language-models

I was curious why MTP affects PP TPS in llama.cpp. My PoC recovers it?

A developer investigating low prompt processing throughput with Multi-Token Prediction (MTP) in llama.cpp created a proof-of-concept that recovers the overhead by processing only the output row of the…

// co-occurs with top 7 entities

llama.cpp 1 Qwen3.6-35B-A3B 1 GLM 5.1 1 GLM 5.2 1 Minimax M3 1 Modal 1 Codeberg 1