Recipe: Map-reduce over LLM calls
Split large inputs into chunks, process each with an LLM in parallel, then merge results into a single coherent output.
Problem
LLM context windows are finite. When you need to analyze a 200-page document or summarize thousands of support tickets, a single prompt will not fit. You need a strategy that scales.
Solution
The map-reduce pattern splits the workload into independent chunks (map), sends each to the LLM concurrently, then feeds all partial results into a final reduction call that produces the finished answer.
Steps
- 1Chunk the input. Split by token count, paragraph boundaries, or semantic breaks. Overlap chunks by 10-20% to preserve context across boundaries.
- 2Map phase. Send each chunk to the LLM with an identical instruction. Run all calls in parallel. Collect structured partial results.
- 3Reduce phase. Feed all partial outputs into a single final prompt. Ask the LLM to synthesize, deduplicate, and format the consolidated result.
Trade-offs
- +Linear speedup from parallel map calls.
- +Handles inputs far larger than any single context window.
- −Cross-chunk relationships may be lost if overlap is too small.
- −Two rounds of LLM calls double latency floor.
Ready to build this? See the Quickstart guide for Meridian's LLM routing and concurrency primitives.