TroubleshootingBeginner

How to fix a Claude response that gets cut off mid-sentence

Detect a max_tokens cutoff, raise the limit, and switch to streaming so long outputs finish cleanly.

6 minBeginner

A reply that stops in the middle of a word or a code block almost always means the model hit the max_tokens cap, not the context window. The cap is a ceiling on how much the model is allowed to write in one response. This guide shows how to confirm the cause and fix it without truncating the user's content.

The Anthropic SDK and an API key
The request that produced the truncated reply
A sense of how long the full answer should be

Step 1: Check stop_reason

The response tells you why it stopped. A value of max_tokens means you hit the cap. A value of end_turn means the model finished on its own and the short answer is intentional. Only the first case needs this fix.

check.py

resp = client.messages.create(
    model="claude-opus-4-8",
    max_tokens=256,
    messages=[{"role": "user", "content": "Write a detailed migration guide."}],
)
print(resp.stop_reason)   # 'max_tokens' means it was cut off

Terminal — truncated output

$ python check.py

max_tokens

(the text ends mid-sentence: '...first update the model id and')

stop_reason of max_tokens confirms the cap, not the window, ended it.

Step 2: Raise max_tokens to a sane default

For ordinary non-streaming requests, a default near 16000 keeps responses well under SDK HTTP timeouts. Do not lowball it. Only go small for deliberately short outputs like classification.

fixed.py

resp = client.messages.create(
    model="claude-opus-4-8",
    max_tokens=16000,
    messages=[{"role": "user", "content": "Write a detailed migration guide."}],
)

Step 3: Stream for very long outputs

If you need more than about 16000 output tokens, a non-streaming call risks a timeout. Switch to streaming and pull the final message at the end. Opus and Fable allow up to 128000 output tokens, but only via streaming.

stream.py

with client.messages.stream(
    model="claude-opus-4-8",
    max_tokens=64000,
    messages=[{"role": "user", "content": "Write the full handbook."}],
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)
    final = stream.get_final_message()
print("\nstopped because:", final.stop_reason)

Tokens are not characters

If you want a roughly 2000 word answer, that is well over 2600 tokens of output. Set max_tokens with headroom above your target length so the model can finish its last paragraph.

Result: stop_reason was max_tokens at the 256 cap. Raising it to 16000 finished the guide, and for the full handbook the streaming call at 64000 produced the complete document with stop_reason of end_turn.