TroubleshootingBeginner

How to fix a Claude response that gets cut off mid-sentence

Detect a max_tokens cutoff, raise the limit, and switch to streaming so long outputs finish cleanly.

6 minBeginner

A reply that stops in the middle of a word or a code block almost always means the model hit the max_tokens cap, not the context window. The cap is a ceiling on how much the model is allowed to write in one response. This guide shows how to confirm the cause and fix it without truncating the user's content.

  • The Anthropic SDK and an API key
  • The request that produced the truncated reply
  • A sense of how long the full answer should be

Step 1: Check stop_reason

The response tells you why it stopped. A value of max_tokens means you hit the cap. A value of end_turn means the model finished on its own and the short answer is intentional. Only the first case needs this fix.

check.py
resp = client.messages.create(
    model="claude-opus-4-8",
    max_tokens=256,
    messages=[{"role": "user", "content": "Write a detailed migration guide."}],
)
print(resp.stop_reason)   # 'max_tokens' means it was cut off
Terminal — truncated output
$ python check.py
max_tokens
(the text ends mid-sentence: '...first update the model id and')
stop_reason of max_tokens confirms the cap, not the window, ended it.

Step 2: Raise max_tokens to a sane default

For ordinary non-streaming requests, a default near 16000 keeps responses well under SDK HTTP timeouts. Do not lowball it. Only go small for deliberately short outputs like classification.

fixed.py
resp = client.messages.create(
    model="claude-opus-4-8",
    max_tokens=16000,
    messages=[{"role": "user", "content": "Write a detailed migration guide."}],
)

Step 3: Stream for very long outputs

If you need more than about 16000 output tokens, a non-streaming call risks a timeout. Switch to streaming and pull the final message at the end. Opus and Fable allow up to 128000 output tokens, but only via streaming.

stream.py
with client.messages.stream(
    model="claude-opus-4-8",
    max_tokens=64000,
    messages=[{"role": "user", "content": "Write the full handbook."}],
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)
    final = stream.get_final_message()
print("\nstopped because:", final.stop_reason)
Tokens are not characters
If you want a roughly 2000 word answer, that is well over 2600 tokens of output. Set max_tokens with headroom above your target length so the model can finish its last paragraph.

Result: stop_reason was max_tokens at the 256 cap. Raising it to 16000 finished the guide, and for the full handbook the streaming call at 64000 produced the complete document with stop_reason of end_turn.

Watch related tutorials

Tags
#max tokens#streaming#claude#errors