Self-ConsistencySampling
Replace greedy decoding with diverse reasoning paths. Sample multiple chains-of-thought, then marginalize over them to find the most consistent answer.
Why It Works
A single forward pass can latch onto a spurious pattern. Self-consistency runs the same prompt with temperature > 0 multiple times, producing varied reasoning traces. The answer that appears most frequently across those traces is the one least sensitive to sampling noise — a cheap approximation of marginal MAP inference.
The Recipe
- 1Set temperature ≥ 0.5. Deterministic (T=0) produces identical traces — no diversity, no benefit.
- 2Sample N=5–21 paths. Odd numbers avoid ties. More paths reduce variance but cost linearly.
- 3Extract final answers. Parse each trace for the concluding answer — regex on the last line, or a structured delimiter.
- 4Majority vote. Return the most common answer. Break ties by picking the answer from the highest-likelihood trace.
When to Use
Arithmetic & Math
GSM8K, MATH — where a single misstep ruins the answer.
Commonsense QA
StrategyQA, Date Understanding — multiple plausible paths.
Code Generation
Sample N solutions, pick the one that passes unit tests.
Symbolic Reasoning
Last-letter concatenation, coin-flip tracking.
Cost Note
Self-consistency multiplies token spend by N. For latency-sensitive deployments, run the N samples in parallel and aggregate. For budget-constrained pipelines, start with N=5 and increase only if answer entropy remains high.