Recipe

Recipe: Semantic search engine

Build a production-grade semantic search pipeline using embeddings, vector storage, and cross-encoder reranking. From raw text to ranked results in under 200 lines.

~2700 bytesPython 3.11+Intermediate

Architecture

The pipeline has three stages: embedding generation via a sentence-transformer model, approximate nearest neighbor retrieval from a vector index, and cross-encoder reranking to refine the top-k candidates. This two-stage retrieval pattern gives you the speed of ANN with the accuracy of full cross-attention.

pipeline flow
Documents ──► Embedding Model ──► Vector Index (FAISS)
                                          │
Query ──► Embedding Model ──► ANN Search ──┤
                                          │
                              Top-100 ──► Cross-Encoder ──► Ranked Top-10

Prerequisites

sentence-transformers

Embedding model inference

faiss-cpu

Vector similarity search

numpy

Array operations

datasets

Optional: benchmark corpus

Key design decisions

  • Bi-encoder for indexing, cross-encoder for scoring. Bi-encoders produce fixed vectors for fast ANN; cross-encoders attend to query-document pairs jointly for higher accuracy.
  • FAISS IndexFlatIP with inner product. Cosine similarity via normalized vectors. Simple, deterministic, no training required for the index itself.
  • Rerank depth of 100. Retrieve 100 candidates from ANN, rerank with cross-encoder, return top 10. Balances latency and recall.