Context Management

Chat history management

Strategies for maintaining coherent conversations without exhausting context windows. Choose between sliding-window precision and summary-based compression.

Sliding-window strategy

Retain the most recent N messages in full fidelity. Older messages are discarded entirely. This approach preserves exact wording and nuance for the active portion of the conversation at the cost of losing long-range context.

Configuration parameters

Window size

N = 20 messages

Keep the last 20 user-assistant pairs in the context buffer.

Overlap

K = 4 messages

Carry forward 4 messages from the previous window to maintain continuity when the window slides.

Best for: short sessions, debugging workflows, and use cases where exact message fidelity matters more than historical context.

Summarize-old strategy

When the conversation exceeds a threshold, older messages are compressed into a running summary. The model sees the summary plus recent messages, preserving semantic continuity while saving tokens.

How it works

1Track total token count. When it exceeds T_max, trigger summarization.
2Pass messages older than the sliding window to a lightweight summarizer model.
3Replace the summarized block with a single system message prefixed with [Summary].
4Append new messages normally. Re-summarize when the token budget is exceeded again.

Best for: long-running sessions, multi-turn reasoning, and conversations where earlier decisions inform later responses.

Recommended hybrid

Combine both strategies: keep the last N messages in full fidelity and maintain a running summary of everything before that window. This gives the model both precise recent context and a compressed view of the full history.

context = [summary] + messages[-N:]

← Back to docs Next: Prompt engineering →