Recipe

Reranking strategy

A retrieval pipeline that returns the right document at rank 12 is still a failed pipeline. Reranking is the second-stage sort that takes a noisy candidate set from your vector store and reorders it with a model that actually understands the query-document relationship. This recipe walks through how to wire a cross-encoder reranker into a Meridian RAG flow without blowing your latency budget.

1. Retrieve wide, rerank narrow

The cardinal rule: your first-stage retriever should over-fetch. If your final prompt uses 5 chunks, ask your vector store for 50. Bi-encoder embeddings are cheap and lossy; they will surface relevant context in the top-50 that they never would have ranked into the top-5. The reranker exists to recover that signal.

A good default is top_k=50 from the retriever, then top_n=5 after reranking. Tune the ratio per workload.

2. Pick the right model class

Cross-encoders score (query, document) jointly in a single forward pass. They are dramatically more accurate than cosine similarity over independent embeddings, and dramatically slower. That tradeoff is why you only run them on the candidate set, never on the full corpus.

For latency-sensitive paths, a small BGE or Cohere rerank model on 50 candidates lands in the 80-150ms range. For batch enrichment jobs, a larger reranker is worth the cost.

3. Wire it through the Meridian gateway

Meridian exposes reranking as a standard OpenAI-compatible endpoint. Send your query and candidate documents, get back a sorted list of indices and relevance scores. The gateway handles model selection, failover, and rate limiting automatically.

const res = await fetch('https://llm.getnimbus.net/v1/rerank', {
  method: 'POST',
  headers: {
    'Authorization': `Bearer ${MERIDIAN_KEY}`,
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({
    model: 'rerank-multilingual-v3',
    query: userQuery,
    documents: candidates.map(c => c.text),
    top_n: 5,
  }),
});

const { results } = await res.json();
const reranked = results.map(r => candidates[r.index]);