Recipe: Long-context summarization strategy
How Meridian compresses multi-turn agent traces and large documents without losing signal.
Problem
Agent loops, RAG retrievals, and long-running tool calls produce context windows that exceed model limits. Naive truncation drops critical instructions or mid-task state.
Strategy
Meridian uses a three-pass pipeline: extract, rank, and fuse.
- Extract — parse the trace into atomic segments: tool calls, observations, user messages, and system directives.
- Rank — score each segment by recency, semantic relevance to the current query, and whether it contains an unresolved action.
- Fuse — assemble a compressed window: keep top-ranked segments, collapse repetitive tool outputs into summaries, and prepend a “state snapshot” so the model resumes cleanly.
Heuristics
- Never drop the system prompt or safety preamble.
- Preserve the last two turns in full.
- Tool errors are always kept verbatim.
- Long file contents are replaced with a hash + first/last 200 tokens.
Result
Typical compression ratio of 4:1 on agent traces with zero regressions on task completion benchmarks. Works across GPT-4, Claude, and Gemini models.