← Docs/Recipes
Recipe

Multi-query RAG fanout

Decompose a single user query into multiple sub-queries, retrieve documents for each in parallel, then fuse results before generation.

Overview

Standard RAG retrieves once. Multi-query RAG rewrites the user prompt into k distinct search queries, fans them out to your vector store, deduplicates the returned chunks, and feeds the union into the LLM. This improves recall when the original question spans multiple topics or uses ambiguous phrasing.

Pipeline

1

Rewrite

LLM generates k queries

2

Fanout

Parallel vector search

3

Fuse

Deduplicate & rank

4

Generate

LLM answers with context

When to use

  • Questions that span multiple documents or domains
  • Ambiguous prompts needing disambiguation
  • High-recall requirements where missing a chunk is costly

Trade-offs

  • Higher token cost from extra LLM rewrite call
  • Increased latency without true parallelism
  • Risk of noise if k is too large
Meridian Docs — getnimbus.net