16:16
2026-06-26
gist.github.com
developer-tools
cuda-oxide: a speed-of-light GEMM in pure Rust (companion notes for the stream)
A developer built a matrix-multiply kernel in pure Rust, called cuda-oxide, that achieves 58% of NVIDIA's hand-tuned library performance on a Blackwell GPU. The kernel, gemm_sol_clc_multicast_4_stage_โฆ