Multi-armed bandit exploration
Dynamically allocate traffic across feature variants using Thompson sampling. Maximize reward while minimizing regret — no static A/B splits.
Overview
A multi-armed bandit treats each feature flag variant as an arm. Meridian tracks conversion events per arm and uses Bayesian inference to shift traffic toward the best-performing variant in real time.
Setup
{
"strategy": "thompson_sampling",
"arms": ["control", "variant_a", "variant_b"],
"metric": "checkout_completed",
"min_samples": 100
}How it works
- 1.Each arm starts with a Beta(1,1) prior.
- 2.On every request, sample from each posterior and pick the arm with the highest draw.
- 3.Update the Beta distribution with observed successes and failures.
- 4.Underperformers naturally receive less traffic as confidence grows.
Guardrails
Set a minimum sample size before the bandit starts shifting traffic. Pin a holdout percentage to maintain statistical validity. Meridian fires a circuit breaker if any arm drops below a configurable conversion floor.