16:02
2026-06-21
moonmath.ai
machine-learning
A Fast Attention Kernel for MI300X, Written in Hip, Not Assembly
A team of kernel engineers developed a bf16 forward attention kernel for AMD MI300X GPUs using HIP, outperforming AMD's own AITER v3 library by up to 1.26ร across various token lengths and rounding moโฆ