Recipe

Online model eval pipeline

Ship a shadow scoring path that logs predictions alongside ground truth without blocking the user-facing response.

Architecture

Deploy a sidecar evaluation worker that consumes a Kafka topic of scored requests. The primary API writes prediction metadata to the topic asynchronously and returns the production model result immediately.

Steps

Instrument your inference endpoint to emit request_id, model_version, prediction, latency_ms to a non-blocking queue.
Join the prediction stream with delayed ground-truth labels (click, conversion, human review) keyed on request_id.
Compute windowed metrics — accuracy, precision@k, calibration error — every 5 minutes in the sidecar.
Expose a Prometheus /metrics endpoint from the sidecar for dashboarding and alerting.

Guardrails

Cap shadow traffic at 10% of production throughput.
Drop eval writes when the topic lag exceeds 30 seconds.
Auto-rollback the candidate model if p99 latency regresses.

Next: Recipe: Feature store hydration