13:00
2026-06-15
dev.to
large-language-models
Hybrid Mamba-Transformer MoEs Hide Their Stalls in Places Dashboards Do Not Look
A developer traced a hybrid Mamba-Transformer MoE inference run and found that MoE all-to-all collective stalls dominate the tail latency, with a 69x tail ratio, despite dashboards showing 96% GPU utiβ¦