Recipe

Hybrid search design

Combine dense vector retrieval with lexical BM25 scoring to get the recall of embeddings plus the precision of keyword match. This recipe walks through the index layout, fusion strategy, and re-ranking stage Meridian uses in production.

1. Dual-index your corpus

Maintain a dense vector index (HNSW) alongside a sparse inverted index (BM25). Both should point at the same document IDs so fusion is a cheap join. Embed chunks of 400 to 800 tokens with a 50 token overlap so neither retriever loses context at boundaries.

  • Store chunk text, source URL, and a stable doc_id on every record.
  • Use the same tokenizer family for chunking and BM25 analysis.
  • Rebuild both indices together to keep recall parity.

2. Fuse with Reciprocal Rank Fusion

RRF is parameter-light and beats weighted-sum fusion when the two scorers are on different scales. Pull the top 50 from each retriever, then merge by reciprocal rank.

def rrf(dense_hits, sparse_hits, k=60):
    scores = {}
    for rank, doc in enumerate(dense_hits):
        scores[doc.id] = scores.get(doc.id, 0) + 1 / (k + rank)
    for rank, doc in enumerate(sparse_hits):
        scores[doc.id] = scores.get(doc.id, 0) + 1 / (k + rank)
    return sorted(scores.items(), key=lambda x: -x[1])[:20]

3. Re-rank the survivors

Pass the top 20 fused results through a cross-encoder re-ranker. A small bge-reranker model is enough; it catches cases where lexical match was misleading and where dense search drifted off-topic. Truncate to the top 5 before handing context to the LLM.

Tip: cache the re-ranker scores keyed by (query_hash, doc_id). Repeat queries are common in agent workflows and the cache hit rate climbs above 40% in steady state.