06:03
2026-06-03
dev.to
large-language-models
llama.cpp b9455 Finally Caught vLLM: 70t/s on 2x3090 Qwen 27B UQ8
A Reddit user reported that llama.cpp build b9455 achieved 67-81 tokens per second on a dual RTX 3090 setup running Unsloth's Qwen3.6-27B-UD-Q8_K_XL model, matching the speed of vLLM for multi-GPU infβ¦