Recipe

Multi-armed bandit exploration

Dynamically allocate traffic across feature variants using Thompson sampling. Maximize reward while minimizing regret — no static A/B splits.

Overview

A multi-armed bandit treats each feature flag variant as an arm. Meridian tracks conversion events per arm and uses Bayesian inference to shift traffic toward the best-performing variant in real time.

Setup

{
  "strategy": "thompson_sampling",
  "arms": ["control", "variant_a", "variant_b"],
  "metric": "checkout_completed",
  "min_samples": 100
}

How it works

1.Each arm starts with a Beta(1,1) prior.
2.On every request, sample from each posterior and pick the arm with the highest draw.
3.Update the Beta distribution with observed successes and failures.
4.Underperformers naturally receive less traffic as confidence grows.

Guardrails

Set a minimum sample size before the bandit starts shifting traffic. Pin a holdout percentage to maintain statistical validity. Meridian fires a circuit breaker if any arm drops below a configurable conversion floor.

Back to docs Try it live