Recipe
Summarize chat history
Compress long multi-turn conversations into a compact running summary so your agent stays under context limits without losing the thread. This recipe shows how to roll a sliding-window summarizer on top of the Meridian chat completions endpoint using model: azure/model-router.
1. Decide when to summarize
Track approximate token usage of the live conversation. When it crosses roughly 70% of the model context window, fold the oldest N messages into a single summary turn and keep the most recent K turns verbatim. Typical defaults: N = first 20 messages, K = last 6 messages.
2. Call the summarizer
Send the chunk to be summarized as a single user message with an explicit instruction. Meridian routes the request to the cheapest healthy model that still hits your quality bar.
curl https://llm.getnimbus.net/v1/chat/completions \
-H "Authorization: Bearer $MERIDIAN_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "azure/model-router",
"messages": [
{"role": "system", "content": "Summarize the conversation below in <= 200 words. Preserve user goals, decisions made, and open questions."},
{"role": "user", "content": "<paste prior turns here>"}
],
"max_tokens": 400
}'3. Splice the summary back in
Replace the summarized prefix with a single system message:{role: "system", content: "Prior conversation summary: ..."}. Keep the last K verbatim turns after it. On the next overflow, summarize the summary plus the new middle chunk together so context stays bounded.