Recipe
Query expansion patterns
Naive vector search misses 30-50% of relevant chunks when the user query is short, ambiguous, or uses different vocabulary than the indexed corpus. Query expansion fans a single user query into multiple semantically related queries, runs each against the index, and merges results. Meridian exposes three battle-tested expansion strategies behind a single endpoint.
1. Synonym expansion
The cheapest pattern: rewrite the query into 3-5 lexical variants using a small model. Best when your corpus is technical and your users are casual, or vice versa. Latency stays under 200ms because no reasoning model is involved. Route this to azure-swc/gpt-4o-mini.
2. HyDE (Hypothetical Document Embeddings)
Generate a fake answer to the query, then embed the fake answer and search with it. This closes the lexical gap between question-style queries and statement-style documents. Pair HyDE with a reranker so hallucinated facts in the fake answer do not pollute results.
3. Multi-query decomposition
For compound questions ("compare X and Y under condition Z"), split into atomic sub-queries, retrieve for each, then union the chunks. Route decomposition through azure/model-router so simple queries skip the reasoning hop.
End-to-end example
// Query expansion via Meridian gateway
const expansions = await fetch('https://meridian.getnimbus.net/v1/expand', {
method: 'POST',
headers: {
'Authorization': 'Bearer ' + process.env.MERIDIAN_KEY,
'Content-Type': 'application/json',
},
body: JSON.stringify({
query: 'how do I cache RAG embeddings',
strategies: ['synonym', 'hyde', 'multi_query'],
model: 'azure/model-router',
max_variants: 6,
}),
}).then(r => r.json());
// expansions.variants -> string[]
// expansions.routed_model -> 'gpt-5-mini' | 'gpt-4o' | ...
for (const v of expansions.variants) {
await vectorIndex.search(v, { topK: 8 });
}