16:45
2026-06-15
developer.nvidia.com
machine-learning
Boosting MoE Training Throughput with Advanced Fusion Kernels
NVIDIA introduced advanced fused MLP kernels for mixture-of-experts (MoE) models, built with the CuTe DSL, delivering 1.3xβ2x kernel-level speedups and enabling sync-free MoE execution. The optimizatiβ¦