A/B Metric Design
How to pick a primary metric that actually detects the effect you care about — and avoid the silent killers that make experiments lie.
The North Star metric
Your primary metric must be sensitive to the mechanism you are changing. If you run a latency experiment and measure revenue, you will need millions of users and weeks of runtime. Measure p95 page-load instead.
Guardrail metrics
Every experiment needs a small set of non-negotiable metrics that must not degrade: crash rate, checkout success, PII leakage. If a guardrail moves, the experiment is a no-ship regardless of the primary result.
The ratio trap
Metrics defined as X / Y (CTR, conversion rate) inflate false positives when the denominator is small or correlated with treatment. Prefer the delta method or linearization over raw ratio t-tests.
Pre-registration
Lock your primary metric, guardrails, and analysis window before peeking at data. Post-hoc metric shopping is the number-one source of non-reproducible results.
Quick checklist
- 1.Is the primary metric directionally aligned with long-term value?
- 2.Are guardrails defined and monitored in real time?
- 3.Is the analysis window fixed before launch?
- 4.Have you run an A/A test to validate variance estimates?