Recipe

Query expansion patterns

Naive vector search misses 30-50% of relevant chunks when the user query is short, ambiguous, or uses different vocabulary than the indexed corpus. Query expansion fans a single user query into multiple semantically related queries, runs each against the index, and merges results. Meridian exposes three battle-tested expansion strategies behind a single endpoint.

1. Synonym expansion

The cheapest pattern: rewrite the query into 3-5 lexical variants using a small model. Best when your corpus is technical and your users are casual, or vice versa. Latency stays under 200ms because no reasoning model is involved. Route this to azure-swc/gpt-4o-mini.

2. HyDE (Hypothetical Document Embeddings)

Generate a fake answer to the query, then embed the fake answer and search with it. This closes the lexical gap between question-style queries and statement-style documents. Pair HyDE with a reranker so hallucinated facts in the fake answer do not pollute results.

3. Multi-query decomposition

For compound questions ("compare X and Y under condition Z"), split into atomic sub-queries, retrieve for each, then union the chunks. Route decomposition through azure/model-router so simple queries skip the reasoning hop.

End-to-end example

// Query expansion via Meridian gateway
const expansions = await fetch('https://meridian.getnimbus.net/v1/expand', {
  method: 'POST',
  headers: {
    'Authorization': 'Bearer ' + process.env.MERIDIAN_KEY,
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({
    query: 'how do I cache RAG embeddings',
    strategies: ['synonym', 'hyde', 'multi_query'],
    model: 'azure/model-router',
    max_variants: 6,
  }),
}).then(r => r.json());

// expansions.variants -> string[]
// expansions.routed_model -> 'gpt-5-mini' | 'gpt-4o' | ...
for (const v of expansions.variants) {
  await vectorIndex.search(v, { topK: 8 });
}