A/B Testing Prompts + Models
Ship faster with data-driven prompt and model selection. Meridian deterministically buckets users, logs every variant, and surfaces statistically significant winners automatically.
Deterministic Bucketing
Every request is assigned a bucket using a stable hash of the user identifier. The same user always lands in the same bucket, ensuring consistent experiences across sessions.
Variant Logging
Every inference request logs the assigned prompt_version and model alongside the user bucket. These fields are queryable in the dashboard for side-by-side metric comparison.
Comparing Metrics
Group results by variant in the dashboard to compare latency, token usage, user satisfaction scores, and conversion rates. Meridian runs a two-tailed Z-test when sample sizes cross the configured threshold and highlights the winner.
| Variant | Users | Avg Latency | Avg Tokens | Satisfaction |
|---|---|---|---|---|
| v2_concise · meridian-7b | 12,401 | 284ms | 612 | 94.2% |
| v1_detailed · meridian-3b | 12,389 | 412ms | 1,034 | 87.1% |
Configuration
Define experiments in the dashboard. Each experiment specifies the traffic split, variant definitions, and the metric used to declare a winner. Changes take effect on the next deployment.
Ready to experiment?
Create your first A/B test in the dashboard and start collecting data on your next deploy.
Create Experiment