How to fix a Claude response that gets cut off mid-sentence
Detect a max_tokens cutoff, raise the limit, and switch to streaming so long outputs finish cleanly.
A reply that stops in the middle of a word or a code block almost always means the model hit the max_tokens cap, not the context window. The cap is a ceiling on how much the model is allowed to write in one response. This guide shows how to confirm the cause and fix it without truncating the user's content.
- The Anthropic SDK and an API key
- The request that produced the truncated reply
- A sense of how long the full answer should be
Step 1: Check stop_reason
The response tells you why it stopped. A value of max_tokens means you hit the cap. A value of end_turn means the model finished on its own and the short answer is intentional. Only the first case needs this fix.
resp = client.messages.create(
model="claude-opus-4-8",
max_tokens=256,
messages=[{"role": "user", "content": "Write a detailed migration guide."}],
)
print(resp.stop_reason) # 'max_tokens' means it was cut offStep 2: Raise max_tokens to a sane default
For ordinary non-streaming requests, a default near 16000 keeps responses well under SDK HTTP timeouts. Do not lowball it. Only go small for deliberately short outputs like classification.
resp = client.messages.create(
model="claude-opus-4-8",
max_tokens=16000,
messages=[{"role": "user", "content": "Write a detailed migration guide."}],
)Step 3: Stream for very long outputs
If you need more than about 16000 output tokens, a non-streaming call risks a timeout. Switch to streaming and pull the final message at the end. Opus and Fable allow up to 128000 output tokens, but only via streaming.
with client.messages.stream(
model="claude-opus-4-8",
max_tokens=64000,
messages=[{"role": "user", "content": "Write the full handbook."}],
) as stream:
for text in stream.text_stream:
print(text, end="", flush=True)
final = stream.get_final_message()
print("\nstopped because:", final.stop_reason)Result: stop_reason was max_tokens at the 256 cap. Raising it to 16000 finished the guide, and for the full handbook the streaming call at 64000 produced the complete document with stop_reason of end_turn.
Watch related tutorials
1:42:18
28:14
41:09
9:47
8:23
52:31