18:28
2026-07-01
blog.alexewerlof.com
large-language-models
Sampling args in llama-server
Llama.cpp users can significantly improve inference speed and output quality by tuning sampling parameters such as temperature, TopP, MinP, TopK, repeat penalty, DRY, XTC, Dynatemp, Adaptive-P, and Miโฆ