@UD-Q8

mentions 1 type Organization feed RSS

06:03

2026-06-03

dev.to

large-language-models

llama.cpp b9455 Finally Caught vLLM: 70t/s on 2x3090 Qwen 27B UQ8

A Reddit user reported that llama.cpp build b9455 achieved 67-81 tokens per second on a dual RTX 3090 setup running Unsloth's Qwen3.6-27B-UD-Q8_K_XL model, matching the speed of vLLM for multi-GPU inf…

// co-occurs with top 7 entities

llama.cpp 1 vLLM 1 Reddit 1 Unsloth 1 Qwen 1 RTX 3090 1 GGUF 1