← Docs
Recipe

Multi-armed bandit exploration

Dynamically allocate traffic across feature variants using Thompson sampling. Maximize reward while minimizing regret — no static A/B splits.

Overview

A multi-armed bandit treats each feature flag variant as an arm. Meridian tracks conversion events per arm and uses Bayesian inference to shift traffic toward the best-performing variant in real time.

Setup

{
  "strategy": "thompson_sampling",
  "arms": ["control", "variant_a", "variant_b"],
  "metric": "checkout_completed",
  "min_samples": 100
}

How it works

  • 1.Each arm starts with a Beta(1,1) prior.
  • 2.On every request, sample from each posterior and pick the arm with the highest draw.
  • 3.Update the Beta distribution with observed successes and failures.
  • 4.Underperformers naturally receive less traffic as confidence grows.

Guardrails

Set a minimum sample size before the bandit starts shifting traffic. Pin a holdout percentage to maintain statistical validity. Meridian fires a circuit breaker if any arm drops below a configurable conversion floor.