How to cap spend on a Claude agent with a task budget
Give an agentic loop a token budget so the model paces itself and finishes gracefully instead of running away.
An agent that loops through many tool calls can burn far more tokens than you expected before it finishes. A task budget tells the model how many tokens it has for the whole loop. The model sees a running countdown and self-moderates, wrapping up gracefully rather than being cut off. This is different from max_tokens, which is a hard per-response ceiling the model is not aware of.
- The Anthropic SDK and an API key
- A model that supports task budgets (Opus 4.7, Opus 4.8, or Fable 5)
- An agentic loop with tools
Step 1: Add task_budget to the request
Task budgets are beta. Use the beta streaming endpoint, pass the task budget beta header, and set task_budget inside output_config. The minimum total is 20000 tokens. Stream so the large max_tokens does not trip an HTTP timeout.
from anthropic import Anthropic
client = Anthropic()
with client.beta.messages.stream(
model="claude-opus-4-8",
max_tokens=128000,
betas=["task-budgets-2026-03-13"],
output_config={
"effort": "high",
"task_budget": {"type": "tokens", "total": 64000},
},
messages=[{"role": "user", "content": "Audit the repo and list issues."}],
tools=tools,
) as stream:
response = stream.get_final_message()Step 2: Track spend across the loop
To show progress, accumulate output_tokens from usage across iterations, plus the tokens of the tool results you feed back in. Leave the budget's remaining field unset in a normal loop; the server tracks the countdown itself.
spent = 0
spent += response.usage.output_tokens
print(f"output so far: {spent} tokens")Step 3: Tune the budget per task type
Set a generous budget for open-ended exploration and a tight one for latency-sensitive jobs. If the budget is too small for the task, the model may finish less thoroughly and say it ran out of budget. Measure a few real runs, then set the total a bit above the median.
Result: the repo audit agent that previously ran past 100,000 output tokens before stopping now paces itself to a 64,000 token budget, delivering a complete issue list and a clean end_turn instead of an abrupt cutoff.
Watch related tutorials
12:36
18:52
22:40
7:18
1:42:18
28:14