16:18
2026-05-28
blog.kog.ai
large-language-models
Building a single-kernel, latency-optimized LLM inference engine on AMD MI300X GPUs
The Kog AI team implemented a single-kernel LLM inference engine on AMD MI300X GPUs, achieving over 3,000 output tokens per second per request for a 2B-parameter model in FP16 precision. The monokerneβ¦