Sampling Parameters

Temperature + top-p

Two knobs that control randomness in language model outputs. Understanding when to turn each one changes everything about how your prompts behave.

Temperature

Temperature scales the logits before softmax. At temp=0, the model always picks the highest-probability token — fully deterministic. At temp=1, the original distribution is unchanged. Above 1, the distribution flattens further, making low-probability tokens more likely.

temp=0
Evaluations

Use for benchmarks, classification, fact extraction, and any task where reproducibility matters. Same input always yields the same output.

temp=0.7
Chat

The sweet spot for conversational agents. Enough variance to feel natural and avoid repetition, but not so much that the model goes off the rails.

temp=1.0+
Creative

Storytelling, brainstorming, poetry. High temperature surfaces surprising word choices. Pair with top-p to cap the tail risk.

Top-p (nucleus sampling)

Top-p sorts tokens by probability descending, then keeps the smallest set whose cumulative probability exceeds p. Everything outside that nucleus gets zeroed out. This dynamically adjusts the candidate pool — narrow for confident predictions, wide when the model is uncertain.

When top-p shines

  • Long-form generation. Temperature alone can let probability mass bleed into hundreds of junk tokens. Top-p clips the tail.
  • Mixed-confidence tasks. When some steps are obvious (next word in a common phrase) and others are open-ended, top-p adapts automatically.
  • Reducing repetition. Combined with a moderate temperature, top-p prevents the model from cycling through the same high-probability candidates.

Recommended defaults

Use caseTemperatureTop-p
Evaluations / extraction01.0
Chat / assistants0.70.9
Creative writing1.00.95
Code generation0.20.95

These are starting points. The best settings depend on your model, prompt, and tolerance for variance. Change one knob at a time and evaluate.

How they interact

Temperature is applied first — it reshapes the entire distribution. Top-p then truncates the reshaped distribution. This means a high temperature can push tokens that were originally low-probability above the nucleus threshold, and a low top-p can undo some of the flattening that temperature introduced. Most APIs let you set both; if you only set one, the other defaults to a neutral value (usually temp=1, top_p=1).