Sampling strategies compared: temperature, top-p, top-k, min-p, and what actually works in production A developer has mapped the four most common LLM sampling parameters—temperature, top-p, top-k, and min-p—to their concrete effects on output distributions, providing a practical guide for production deployment without relying on general-purpose defaults. The analysis shows that temperature is applied before softmax as a distribution-wide transform that can activate low-probability tokens, while top-p, top-k, and min-p truncate the distribution after softmax, with the order of operations critical since setting temperature to zero renders later truncation parameters ineffective. You deployed a chatbot, picked temperature 0.7 because every blog post says that, and the first live user sends back screenshots of responses that drift into gibberish mid-sentence. A colleague suggests top-p 0.9. Another says top-k 50. Someone new to the team mentions min-p and claims it solves everything. You have no benchmark, no test set, and no way to tell whether any of these knobs actually fix your specific problem instead of just making the outputs shorter. This is the state of sampling parameter selection for most teams shipping LLM products. The parameters are poorly documented, they interact in non-intuitive ways, and the default values in every inference engine are tuned for general-purpose chat benchmarks, not for your use case. This post maps the four most common sampling knobs -- temperature, top-p, top-k, and min-p -- to the concrete effects they have on the output distribution, so you can pick the right one or combination without guessing. Every LLM generates text one token at a time by choosing from a probability distribution over the vocabulary. The raw distribution the logits from the final transformer layer, passed through softmax is almost never used directly. A raw distribution might assign 0.0001 probability to fifty thousand tokens and 0.3 to the top token. If you sample directly from that, you get a narrow band of high-probability continuations that sound repetitive and robotic. Sampling parameters reshape this distribution. The goal is to widen the distribution enough for creative or useful variation, but not so much that the model assigns meaningful probability to tokens that make no sense. Each parameter attacks a different failure mode: The following diagram shows how each strategy transforms the same logit distribution: php flowchart LR A Raw logits