llama-cli

mentions 1 type Organization feed RSS

// recent coverage 1 mentions

09:36

2026-06-18

dev.to

developer-tools

llama-bench skipped FA on capable GPUs — b9437 corrects it

Build b9437 of llama.cpp fixes two default-value bugs in llama-bench that caused flash attention to be skipped on capable GPUs and GPU-layer count to use a legacy sentinel. The flash attention flag no…

// co-occurs with top 7 entities

llama.cpp 1 llama-bench 1 llama-server 1 JohannesGaessler 1 pwilkin 1 CUDA 1 Metal 1

// topics top 3 topics

developer tools 1 machine learning 1 large language models 1