09:36
2026-06-18
dev.to
developer-tools
llama-bench skipped FA on capable GPUs โ b9437 corrects it
Build b9437 of llama.cpp fixes two default-value bugs in llama-bench that caused flash attention to be skipped on capable GPUs and GPU-layer count to use a legacy sentinel. The flash attention flag noโฆ