Back to DocsRecipe

A/B Metric Design

How to pick a primary metric that actually detects the effect you care about — and avoid the silent killers that make experiments lie.

The North Star metric

Your primary metric must be sensitive to the mechanism you are changing. If you run a latency experiment and measure revenue, you will need millions of users and weeks of runtime. Measure p95 page-load instead.

Guardrail metrics

Every experiment needs a small set of non-negotiable metrics that must not degrade: crash rate, checkout success, PII leakage. If a guardrail moves, the experiment is a no-ship regardless of the primary result.

The ratio trap

Metrics defined as X / Y (CTR, conversion rate) inflate false positives when the denominator is small or correlated with treatment. Prefer the delta method or linearization over raw ratio t-tests.

Pre-registration

Lock your primary metric, guardrails, and analysis window before peeking at data. Post-hoc metric shopping is the number-one source of non-reproducible results.

Quick checklist

1.Is the primary metric directionally aligned with long-term value?
2.Are guardrails defined and monitored in real time?
3.Is the analysis window fixed before launch?
4.Have you run an A/A test to validate variance estimates?