CuTe DSL

mentions 1 type Person feed RSS

// recent coverage 1 mentions

16:45

2026-06-15

developer.nvidia.com

machine-learning

Boosting MoE Training Throughput with Advanced Fusion Kernels

NVIDIA introduced advanced fused MLP kernels for mixture-of-experts (MoE) models, built with the CuTe DSL, delivering 1.3x–2x kernel-level speedups and enabling sync-free MoE execution. The optimizati…

// co-occurs with top 7 entities

NVIDIA 1 DeepSeek-V3 1 GPT-OSS 1 cuDNN Frontend 1 Transformer Engine 1 Megatron-Core 1 Tensor Cores 1

// topics top 5 topics

machine learning 1 large language models 1 ai infrastructure 1 ai chips 1 developer tools 1