Back to docsRecipe
ML model registry design
A reference architecture for tracking, versioning, and deploying machine learning models across training and inference pipelines.
Core entities
- Model — logical grouping (e.g. “fraud-detector”)
- Version — immutable snapshot with artifact URI, metrics, and metadata
- Stage — lifecycle state: staging, production, archived
API surface
POST /models/:name/versions
GET /models/:name/versions/:id
PUT /models/:name/versions/:id/stage
GET /models/:name/productionStorage layout
Artifacts land in S3-compatible object storage under models/{name}/{version}/. Metadata lives in a relational store with strict schema versioning.
Promotion flow
- CI trains and registers a new version
- Integration tests validate against a holdout set
- Manual approval promotes to staging
- Canary deploy validates latency and accuracy
- Full promotion flips the production pointer
Observability
Every promotion emits structured audit logs. A dashboard surfaces drift between registered metrics and live inference telemetry, triggering rollback when thresholds breach.