Recipe
Multi-query RAG fanout
Decompose a single user query into multiple sub-queries, retrieve documents for each in parallel, then fuse results before generation.
Overview
Standard RAG retrieves once. Multi-query RAG rewrites the user prompt into k distinct search queries, fans them out to your vector store, deduplicates the returned chunks, and feeds the union into the LLM. This improves recall when the original question spans multiple topics or uses ambiguous phrasing.
Pipeline
1
Rewrite
LLM generates k queries
2
Fanout
Parallel vector search
3
Fuse
Deduplicate & rank
4
Generate
LLM answers with context
When to use
- Questions that span multiple documents or domains
- Ambiguous prompts needing disambiguation
- High-recall requirements where missing a chunk is costly
Trade-offs
- Higher token cost from extra LLM rewrite call
- Increased latency without true parallelism
- Risk of noise if k is too large