00:00
2026-06-02
together.ai
large-language-models
Serving MiniMax-M3 for efficient inference: Unlocking 1M-Token Context and Multimodality Without Regrets
Together AI partnered with MiniMax to serve the new M3 model in production, achieving 81β125% throughput improvements through engineering optimizations including a KV-Block-Major sparse attention kernβ¦