Back to docs

Recipe

Summarize chat history

Compress long multi-turn conversations into a compact running summary so your agent stays under context limits without losing the thread. This recipe shows how to roll a sliding-window summarizer on top of the Meridian chat completions endpoint using model: azure/model-router.

1. Decide when to summarize

Track approximate token usage of the live conversation. When it crosses roughly 70% of the model context window, fold the oldest N messages into a single summary turn and keep the most recent K turns verbatim. Typical defaults: N = first 20 messages, K = last 6 messages.

2. Call the summarizer

Send the chunk to be summarized as a single user message with an explicit instruction. Meridian routes the request to the cheapest healthy model that still hits your quality bar.

curl https://llm.getnimbus.net/v1/chat/completions \
  -H "Authorization: Bearer $MERIDIAN_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "azure/model-router",
    "messages": [
      {"role": "system", "content": "Summarize the conversation below in <= 200 words. Preserve user goals, decisions made, and open questions."},
      {"role": "user", "content": "<paste prior turns here>"}
    ],
    "max_tokens": 400
  }'

3. Splice the summary back in

Replace the summarized prefix with a single system message:{role: "system", content: "Prior conversation summary: ..."}. Keep the last K verbatim turns after it. On the next overflow, summarize the summary plus the new middle chunk together so context stays bounded.