04:59
2026-06-17
dev.to
large-language-models
Kog hits 3K t/s on MI300X, no kernel switches β test it now
Kog AI achieved over 3,000 output tokens per second per request for an FP16 2B model on a single 8Γ MI300X node using a monokernel that eliminates per-token kernel launches. The technique collapses thβ¦