← Docs
Recipe

Online model eval pipeline

Ship a shadow scoring path that logs predictions alongside ground truth without blocking the user-facing response.

Architecture

Deploy a sidecar evaluation worker that consumes a Kafka topic of scored requests. The primary API writes prediction metadata to the topic asynchronously and returns the production model result immediately.

Steps

  1. Instrument your inference endpoint to emit request_id, model_version, prediction, latency_ms to a non-blocking queue.
  2. Join the prediction stream with delayed ground-truth labels (click, conversion, human review) keyed on request_id.
  3. Compute windowed metrics — accuracy, precision@k, calibration error — every 5 minutes in the sidecar.
  4. Expose a Prometheus /metrics endpoint from the sidecar for dashboarding and alerting.

Guardrails

  • Cap shadow traffic at 10% of production throughput.
  • Drop eval writes when the topic lag exceeds 30 seconds.
  • Auto-rollback the candidate model if p99 latency regresses.