Back to docsRecipe

ML model registry design

A reference architecture for tracking, versioning, and deploying machine learning models across training and inference pipelines.

Core entities

  • Model — logical grouping (e.g. “fraud-detector”)
  • Version — immutable snapshot with artifact URI, metrics, and metadata
  • Stage — lifecycle state: staging, production, archived

API surface

POST   /models/:name/versions
GET    /models/:name/versions/:id
PUT    /models/:name/versions/:id/stage
GET    /models/:name/production

Storage layout

Artifacts land in S3-compatible object storage under models/{name}/{version}/. Metadata lives in a relational store with strict schema versioning.

Promotion flow

  1. CI trains and registers a new version
  2. Integration tests validate against a holdout set
  3. Manual approval promotes to staging
  4. Canary deploy validates latency and accuracy
  5. Full promotion flips the production pointer

Observability

Every promotion emits structured audit logs. A dashboard surfaces drift between registered metrics and live inference telemetry, triggering rollback when thresholds breach.