Recipe

Recipe: Map-reduce over LLM calls

Split large inputs into chunks, process each with an LLM in parallel, then merge results into a single coherent output.

Problem

LLM context windows are finite. When you need to analyze a 200-page document or summarize thousands of support tickets, a single prompt will not fit. You need a strategy that scales.

Solution

The map-reduce pattern splits the workload into independent chunks (map), sends each to the LLM concurrently, then feeds all partial results into a final reduction call that produces the finished answer.

Steps

1Chunk the input. Split by token count, paragraph boundaries, or semantic breaks. Overlap chunks by 10-20% to preserve context across boundaries.
2Map phase. Send each chunk to the LLM with an identical instruction. Run all calls in parallel. Collect structured partial results.
3Reduce phase. Feed all partial outputs into a single final prompt. Ask the LLM to synthesize, deduplicate, and format the consolidated result.

Trade-offs

+Linear speedup from parallel map calls.
+Handles inputs far larger than any single context window.
−Cross-chunk relationships may be lost if overlap is too small.
−Two rounds of LLM calls double latency floor.

Ready to build this? See the Quickstart guide for Meridian's LLM routing and concurrency primitives.