Recommender system architecture
A production blueprint for building a multi-stage recommendation pipeline that balances latency, personalization, and cold-start handling.
Overview
This recipe covers a three-tier architecture: candidate generation via collaborative filtering, a lightweight ranking layer using gradient-boosted trees, and a final re-ranking pass with business rules. All stages are stateless and horizontally scalable behind a gRPC gateway.
Components
- Recall service — Annoy index over user/item embeddings refreshed hourly via Spark jobs.
- Ranker — XGBoost model served via ONNX Runtime; features include dwell time, CTR history, and contextual signals.
- Re-ranker — Rule engine enforcing diversity, freshness boosts, and inventory constraints.
- Feature store — Redis-backed with write-behind to S3; sub-5ms p99 reads.
Cold start
New users receive popularity-smoothed recommendations seeded from a global trending pool. New items are injected via Thompson sampling explore slots until they accumulate 50 interactions.
Observability
Prometheus metrics on recall latency, ranking precision@k, and diversity index. OpenTelemetry traces span the full request path from API gateway through each service.