Online model eval pipeline
Ship a shadow scoring path that logs predictions alongside ground truth without blocking the user-facing response.
Architecture
Deploy a sidecar evaluation worker that consumes a Kafka topic of scored requests. The primary API writes prediction metadata to the topic asynchronously and returns the production model result immediately.
Steps
- Instrument your inference endpoint to emit
request_id, model_version, prediction, latency_msto a non-blocking queue. - Join the prediction stream with delayed ground-truth labels (click, conversion, human review) keyed on request_id.
- Compute windowed metrics — accuracy, precision@k, calibration error — every 5 minutes in the sidecar.
- Expose a Prometheus /metrics endpoint from the sidecar for dashboarding and alerting.
Guardrails
- Cap shadow traffic at 10% of production throughput.
- Drop eval writes when the topic lag exceeds 30 seconds.
- Auto-rollback the candidate model if p99 latency regresses.