Chat history management
Strategies for maintaining coherent conversations without exhausting context windows. Choose between sliding-window precision and summary-based compression.
Sliding-window strategy
Retain the most recent N messages in full fidelity. Older messages are discarded entirely. This approach preserves exact wording and nuance for the active portion of the conversation at the cost of losing long-range context.
Configuration parameters
N = 20 messages
Keep the last 20 user-assistant pairs in the context buffer.
K = 4 messages
Carry forward 4 messages from the previous window to maintain continuity when the window slides.
Summarize-old strategy
When the conversation exceeds a threshold, older messages are compressed into a running summary. The model sees the summary plus recent messages, preserving semantic continuity while saving tokens.
How it works
- 1Track total token count. When it exceeds
T_max, trigger summarization. - 2Pass messages older than the sliding window to a lightweight summarizer model.
- 3Replace the summarized block with a single system message prefixed with
[Summary]. - 4Append new messages normally. Re-summarize when the token budget is exceeded again.
Recommended hybrid
Combine both strategies: keep the last N messages in full fidelity and maintain a running summary of everything before that window. This gives the model both precise recent context and a compressed view of the full history.