19:03
2026-06-30
rocm.blogs.amd.com
large-language-models
Accelerating LLM Inference on AMD GPUs with Low-Latency GEMMs
AMD announced a new kernel family, LDS-Pipelined Split-K GEMM, that accelerates LLM inference on AMD GPUs by optimizing decode-time GEMMs with small M and large N/K dimensions. The technique achieves โฆ