Recipe: Long-context summarization strategy

How Meridian compresses multi-turn agent traces and large documents without losing signal.

Problem

Agent loops, RAG retrievals, and long-running tool calls produce context windows that exceed model limits. Naive truncation drops critical instructions or mid-task state.

Strategy

Meridian uses a three-pass pipeline: extract, rank, and fuse.

Extract — parse the trace into atomic segments: tool calls, observations, user messages, and system directives.
Rank — score each segment by recency, semantic relevance to the current query, and whether it contains an unresolved action.
Fuse — assemble a compressed window: keep top-ranked segments, collapse repetitive tool outputs into summaries, and prepend a “state snapshot” so the model resumes cleanly.

Heuristics

Never drop the system prompt or safety preamble.
Preserve the last two turns in full.
Tool errors are always kept verbatim.
Long file contents are replaced with a hash + first/last 200 tokens.

Result

Typical compression ratio of 4:1 on agent traces with zero regressions on task completion benchmarks. Works across GPT-4, Claude, and Gemini models.

← Back to docs