Multi-query retrieval
A single user query rarely surfaces every relevant chunk in your vector store. Multi-query retrieval rephrases the question into several semantic variants, runs them in parallel, and merges the results. This recipe shows how to wire it up against the Meridian gateway with three lines of orchestration code and a reranker pass at the end.
1. Generate query variants
Ask a cheap routing model to produce four to six paraphrases. Vary specificity, lexical choice, and the implicit time horizon. Keep each variant under sixteen tokens so embedding latency stays flat. Diversity matters more than fluency here.
2. Fan out the embeddings
Embed every variant in a single batched request, then issue parallel similarity searches against your index. De-duplicate by chunk id and keep the highest score per chunk. The union typically lifts recall@10 by twenty to thirty points on long-tail questions.
import OpenAI from "openai";
const client = new OpenAI({
apiKey: process.env.MERIDIAN_KEY,
baseURL: "https://llm.getnimbus.net/v1",
});
const variants = await client.chat.completions.create({
model: "azure/model-router",
messages: [
{ role: "system", content: "Rewrite the user query as 5 diverse search queries." },
{ role: "user", content: userQuestion },
],
});
const queries = variants.choices[0].message.content.split("\n");
const hits = await Promise.all(queries.map((q) => vectorSearch(q, 10)));
const merged = dedupeByChunkId(hits.flat());3. Rerank and trim
The merged set is wide but noisy. Pipe the top forty chunks through a cross-encoder reranker, then truncate to whatever your context budget allows. Meridian routes reranker calls through the same gateway, so you can switch providers without code changes when a faster model lands in the catalog.