TroubleshootingAdvanced

How to keep a long Claude conversation under the context limit

Turn on server-side compaction so a multi-turn chat summarizes its own history before it overflows the window.

9 minAdvanced

A chat that runs for many turns keeps growing, because you resend the full history on every request. Eventually it approaches the context window and either errors or gets expensive. Server-side compaction lets the API summarize earlier turns automatically before you hit the limit, so the conversation keeps going. This guide shows how to enable it and handle the one tricky part correctly.

  • The Anthropic SDK and an API key
  • A model that supports compaction (Opus 4.6+, Sonnet 4.6, or Fable 5)
  • A conversation loop that resends message history

Step 1: Enable compaction on the beta endpoint

Compaction is a beta feature. Call the beta messages endpoint, pass the compact beta header, and add a compact edit to context_management. The API will summarize old context when it nears the trigger threshold.

compact.py
from anthropic import Anthropic

client = Anthropic()
messages = []

def chat(text):
    messages.append({"role": "user", "content": text})
    resp = client.beta.messages.create(
        betas=["compact-2026-01-12"],
        model="claude-opus-4-8",
        max_tokens=4000,
        messages=messages,
        context_management={"edits": [{"type": "compact_20260112"}]},
    )
    # Append the FULL content, not just text (see step 2)
    messages.append({"role": "assistant", "content": resp.content})
    return resp
Append response.content, not response text
Compaction returns a compaction block inside content. If you append only the text string, you silently lose the compaction state and the next request rebuilds the whole history. Always append the entire content array.

Step 2: Confirm compaction is firing

Watch the usage object across turns. As the conversation grows, input_tokens should level off rather than climb forever once compaction kicks in, because the older turns are now represented by a compact summary.

Terminal — input tokens leveling off
turn 5 input_tokens: 18,200
turn 10 input_tokens: 41,900
turn 15 input_tokens: 44,300 <- compaction summarized old turns
turn 20 input_tokens: 46,800
Without compaction the input grows every turn; with it, the curve flattens.

Step 3: Know the difference from context editing

Compaction summarizes old turns into a compact block. Context editing instead clears stale tool results or thinking blocks entirely. They are separate features with separate headers. Use compaction for long chats; use context editing for agents that pile up large tool outputs you no longer need.

FeatureWhat it doesBeta header
CompactionSummarizes earlier historycompact-2026-01-12
Context editingClears old tool results or thinkingcontext-management-2025-06-27
Stream long replies
Long-running conversations often produce long replies. Combine compaction with streaming so a big answer does not hit an HTTP timeout while the history is being compacted.

Result: a 20 turn support session that previously crept toward the window stabilized around 46,800 input tokens once compaction started summarizing the early turns, so the chat kept running without an overflow and without resending the entire transcript at full price.

Watch related tutorials

Tags
#context#compaction#claude#conversations