Recipe: Long-context summarization strategy

How Meridian compresses multi-turn agent traces and large documents without losing signal.

Problem

Agent loops, RAG retrievals, and long-running tool calls produce context windows that exceed model limits. Naive truncation drops critical instructions or mid-task state.

Strategy

Meridian uses a three-pass pipeline: extract, rank, and fuse.

  1. Extract — parse the trace into atomic segments: tool calls, observations, user messages, and system directives.
  2. Rank — score each segment by recency, semantic relevance to the current query, and whether it contains an unresolved action.
  3. Fuse — assemble a compressed window: keep top-ranked segments, collapse repetitive tool outputs into summaries, and prepend a “state snapshot” so the model resumes cleanly.

Heuristics

  • Never drop the system prompt or safety preamble.
  • Preserve the last two turns in full.
  • Tool errors are always kept verbatim.
  • Long file contents are replaced with a hash + first/last 200 tokens.

Result

Typical compression ratio of 4:1 on agent traces with zero regressions on task completion benchmarks. Works across GPT-4, Claude, and Gemini models.