23:16
2026-06-12
letsdatascience.com
large-language-models
Xiaomi MiMo Hits 1,000 Tokens-Per-Second Inference
Xiaomi's MiMo-V2.5-Pro-UltraSpeed, a 1.02-trillion-parameter MoE model, achieved 1,000 tokens per second inference on standard cloud GPUs using FP4 quantization, DFlash speculative decoding, and TileRβ¦