Back to Docs
Prompt Engineering

Self-ConsistencySampling

Replace greedy decoding with diverse reasoning paths. Sample multiple chains-of-thought, then marginalize over them to find the most consistent answer.

Why It Works

A single forward pass can latch onto a spurious pattern. Self-consistency runs the same prompt with temperature > 0 multiple times, producing varied reasoning traces. The answer that appears most frequently across those traces is the one least sensitive to sampling noise — a cheap approximation of marginal MAP inference.

The Recipe

  1. 1Set temperature ≥ 0.5. Deterministic (T=0) produces identical traces — no diversity, no benefit.
  2. 2Sample N=5–21 paths. Odd numbers avoid ties. More paths reduce variance but cost linearly.
  3. 3Extract final answers. Parse each trace for the concluding answer — regex on the last line, or a structured delimiter.
  4. 4Majority vote. Return the most common answer. Break ties by picking the answer from the highest-likelihood trace.

When to Use

Arithmetic & Math

GSM8K, MATH — where a single misstep ruins the answer.

Commonsense QA

StrategyQA, Date Understanding — multiple plausible paths.

Code Generation

Sample N solutions, pick the one that passes unit tests.

Symbolic Reasoning

Last-letter concatenation, coin-flip tracking.

Cost Note

Self-consistency multiplies token spend by N. For latency-sensitive deployments, run the N samples in parallel and aggregate. For budget-constrained pipelines, start with N=5 and increase only if answer entropy remains high.