Recipe: Dataset drift detection
Monitor feature and label distributions in production to catch silent model degradation before it impacts users.
Overview
Dataset drift occurs when the statistical properties of incoming data diverge from the training distribution. Meridian compares live inference payloads against a stored baseline using the two-sample Kolmogorov-Smirnov test and Jensen-Shannon divergence, surfacing per-feature drift scores in real time.
Prerequisites
- Meridian SDK v2.1+ instrumented in your inference pipeline
- A baseline dataset exported from your training or validation split
- At least 200 inference requests logged to establish a comparison window
Steps
- Upload baseline
Navigate to Datasets → Baselines and upload a CSV or Parquet file containing the features and labels your model expects.
- Enable drift monitoring
Toggle drift detection on the model's configuration page. Select the baseline and set a drift threshold (default: 0.15 JS distance).
- Inspect drift reports
Open Monitoring → Drift to view per-feature scores, historical trends, and automated alerts when thresholds are breached.
Interpreting results
A JS divergence above 0.15 indicates meaningful distribution shift. Pair this with prediction accuracy metrics to distinguish benign drift (seasonal patterns) from harmful drift (data pipeline bugs, upstream schema changes).