←Back to docs
Architecture
Recipe Reranker Design
Two-stage retrieval pipeline that pairs fast vector search with precision cross-encoder scoring for recipe recommendations.
Pipeline Overview
1User query → embedding via text-embedding-3-small
2ANN retrieval: top-100 candidates from Qdrant
3Cross-encoder scores all 100 query-document pairs
4Re-rank by cross-encoder score → top-10 returned
Cross-Encoder Model
We run a fine-tuned MiniLM-L6-v2 cross-encoder on an A10G GPU via Modal. The model takes concatenated [query, recipe_text] pairs and outputs a single relevance logit. Inference for 100 pairs completes in under 80ms at p99.
Latency Budget
| Stage | p50 | p99 |
|---|---|---|
| Embedding | 45ms | 120ms |
| Qdrant ANN | 12ms | 35ms |
| Cross-encoder | 55ms | 80ms |