How to keep a long Claude conversation under the context limit
Turn on server-side compaction so a multi-turn chat summarizes its own history before it overflows the window.
A chat that runs for many turns keeps growing, because you resend the full history on every request. Eventually it approaches the context window and either errors or gets expensive. Server-side compaction lets the API summarize earlier turns automatically before you hit the limit, so the conversation keeps going. This guide shows how to enable it and handle the one tricky part correctly.
- The Anthropic SDK and an API key
- A model that supports compaction (Opus 4.6+, Sonnet 4.6, or Fable 5)
- A conversation loop that resends message history
Step 1: Enable compaction on the beta endpoint
Compaction is a beta feature. Call the beta messages endpoint, pass the compact beta header, and add a compact edit to context_management. The API will summarize old context when it nears the trigger threshold.
from anthropic import Anthropic
client = Anthropic()
messages = []
def chat(text):
messages.append({"role": "user", "content": text})
resp = client.beta.messages.create(
betas=["compact-2026-01-12"],
model="claude-opus-4-8",
max_tokens=4000,
messages=messages,
context_management={"edits": [{"type": "compact_20260112"}]},
)
# Append the FULL content, not just text (see step 2)
messages.append({"role": "assistant", "content": resp.content})
return respStep 2: Confirm compaction is firing
Watch the usage object across turns. As the conversation grows, input_tokens should level off rather than climb forever once compaction kicks in, because the older turns are now represented by a compact summary.
Step 3: Know the difference from context editing
Compaction summarizes old turns into a compact block. Context editing instead clears stale tool results or thinking blocks entirely. They are separate features with separate headers. Use compaction for long chats; use context editing for agents that pile up large tool outputs you no longer need.
| Feature | What it does | Beta header |
|---|---|---|
| Compaction | Summarizes earlier history | compact-2026-01-12 |
| Context editing | Clears old tool results or thinking | context-management-2025-06-27 |
Result: a 20 turn support session that previously crept toward the window stabilized around 46,800 input tokens once compaction started summarizing the early turns, so the chat kept running without an overflow and without resending the entire transcript at full price.
Watch related tutorials
1:42:18
28:14
41:09
9:47
8:23
52:31