Recipe

Multi-query RAG fanout

Decompose a single user query into multiple sub-queries, retrieve documents for each in parallel, then fuse results before generation.

Overview

Standard RAG retrieves once. Multi-query RAG rewrites the user prompt into k distinct search queries, fans them out to your vector store, deduplicates the returned chunks, and feeds the union into the LLM. This improves recall when the original question spans multiple topics or uses ambiguous phrasing.

Pipeline

Rewrite

LLM generates k queries

Fanout

Parallel vector search

Fuse

Deduplicate & rank

Generate

LLM answers with context

When to use

Questions that span multiple documents or domains
Ambiguous prompts needing disambiguation
High-recall requirements where missing a chunk is costly

Trade-offs

Higher token cost from extra LLM rewrite call
Increased latency without true parallelism
Risk of noise if k is too large

Next: Recipe: HyDE (hypothetical document embeddings)