Temperature + top-p
Two knobs that control randomness in language model outputs. Understanding when to turn each one changes everything about how your prompts behave.
Temperature
Temperature scales the logits before softmax. At temp=0, the model always picks the highest-probability token — fully deterministic. At temp=1, the original distribution is unchanged. Above 1, the distribution flattens further, making low-probability tokens more likely.
Use for benchmarks, classification, fact extraction, and any task where reproducibility matters. Same input always yields the same output.
The sweet spot for conversational agents. Enough variance to feel natural and avoid repetition, but not so much that the model goes off the rails.
Storytelling, brainstorming, poetry. High temperature surfaces surprising word choices. Pair with top-p to cap the tail risk.
Top-p (nucleus sampling)
Top-p sorts tokens by probability descending, then keeps the smallest set whose cumulative probability exceeds p. Everything outside that nucleus gets zeroed out. This dynamically adjusts the candidate pool — narrow for confident predictions, wide when the model is uncertain.
When top-p shines
- Long-form generation. Temperature alone can let probability mass bleed into hundreds of junk tokens. Top-p clips the tail.
- Mixed-confidence tasks. When some steps are obvious (next word in a common phrase) and others are open-ended, top-p adapts automatically.
- Reducing repetition. Combined with a moderate temperature, top-p prevents the model from cycling through the same high-probability candidates.
Recommended defaults
| Use case | Temperature | Top-p |
|---|---|---|
| Evaluations / extraction | 0 | 1.0 |
| Chat / assistants | 0.7 | 0.9 |
| Creative writing | 1.0 | 0.95 |
| Code generation | 0.2 | 0.95 |
These are starting points. The best settings depend on your model, prompt, and tolerance for variance. Change one knob at a time and evaluate.
How they interact
Temperature is applied first — it reshapes the entire distribution. Top-p then truncates the reshaped distribution. This means a high temperature can push tokens that were originally low-probability above the nucleus threshold, and a low top-p can undo some of the flattening that temperature introduced. Most APIs let you set both; if you only set one, the other defaults to a neutral value (usually temp=1, top_p=1).