Recipe
Recipe: Semantic search engine
Build a production-grade semantic search pipeline using embeddings, vector storage, and cross-encoder reranking. From raw text to ranked results in under 200 lines.
~2700 bytesPython 3.11+Intermediate
Architecture
The pipeline has three stages: embedding generation via a sentence-transformer model, approximate nearest neighbor retrieval from a vector index, and cross-encoder reranking to refine the top-k candidates. This two-stage retrieval pattern gives you the speed of ANN with the accuracy of full cross-attention.
pipeline flow
Documents ──► Embedding Model ──► Vector Index (FAISS)
│
Query ──► Embedding Model ──► ANN Search ──┤
│
Top-100 ──► Cross-Encoder ──► Ranked Top-10Prerequisites
sentence-transformersEmbedding model inference
faiss-cpuVector similarity search
numpyArray operations
datasetsOptional: benchmark corpus
Key design decisions
- ▸Bi-encoder for indexing, cross-encoder for scoring. Bi-encoders produce fixed vectors for fast ANN; cross-encoders attend to query-document pairs jointly for higher accuracy.
- ▸FAISS IndexFlatIP with inner product. Cosine similarity via normalized vectors. Simple, deterministic, no training required for the index itself.
- ▸Rerank depth of 100. Retrieve 100 candidates from ANN, rerank with cross-encoder, return top 10. Balances latency and recall.