← Back to docs

Recipe

Chat history trimming

Long conversations exceed model context windows and inflate token costs. This recipe shows three patterns for trimming chat history on the Meridian gateway so your agent stays fast, cheap, and coherent across hundreds of turns without losing recent intent.

1. Sliding window

Keep the system prompt plus the last N message pairs. Cheapest strategy. Works for short-task agents that don't need long-range memory. Drop everything older than the window before you hit the gateway.

2. Token-budgeted truncation

Count tokens per message and pop oldest turns until the budget fits. Reserve roughly 25% of the model's context for the response. Meridian returns token counts in the usage block so you can self-correct on the next call.

3. Summarize-and-replace

When the window fills, hand the oldest half to a cheap summarizer (gpt-4o-mini, llama4-mav) and replace those turns with a single assistant message containing the summary. Preserves long-range context at fixed token cost.

function trimHistory(messages, maxTokens = 8000) {
  const sys = messages[0];
  const rest = messages.slice(1);
  let total = countTokens(sys);
  const kept = [];
  for (let i = rest.length - 1; i >= 0; i--) {
    const t = countTokens(rest[i]);
    if (total + t > maxTokens) break;
    total += t;
    kept.unshift(rest[i]);
  }
  return [sys, ...kept];
}
← More recipes