09:47
2026-05-29
blog.kog.ai
large-language-models
Real-time LLM Inference on Standard GPUs: 3k tokens/s per request
Kog AI launched a tech preview of its Kog Inference Engine (KIE), achieving 3,000 output tokens per second per request on 8× AMD MI300X GPUs and 2,100 on 8× NVIDIA H200 GPUs using a 2B model. The comp…