← Docs
Recipe

Recommender system architecture

A production blueprint for building a multi-stage recommendation pipeline that balances latency, personalization, and cold-start handling.

Overview

This recipe covers a three-tier architecture: candidate generation via collaborative filtering, a lightweight ranking layer using gradient-boosted trees, and a final re-ranking pass with business rules. All stages are stateless and horizontally scalable behind a gRPC gateway.

Components

  • Recall service — Annoy index over user/item embeddings refreshed hourly via Spark jobs.
  • Ranker — XGBoost model served via ONNX Runtime; features include dwell time, CTR history, and contextual signals.
  • Re-ranker — Rule engine enforcing diversity, freshness boosts, and inventory constraints.
  • Feature store — Redis-backed with write-behind to S3; sub-5ms p99 reads.

Cold start

New users receive popularity-smoothed recommendations seeded from a global trending pool. New items are injected via Thompson sampling explore slots until they accumulate 50 interactions.

Observability

Prometheus metrics on recall latency, ranking precision@k, and diversity index. OpenTelemetry traces span the full request path from API gateway through each service.