How to halve Claude costs for bulk jobs with the Batch API
Send non-urgent work through the Message Batches API to get 50 percent off standard token prices.
If you have thousands of requests that do not need an answer in the next second, sending them one at a time at full price is the expensive way to do it. The Message Batches API runs them asynchronously at half the standard token price. Most batches finish within an hour. This guide walks through creating a batch, polling it, and reading results.
- The Anthropic SDK and an API key
- A list of independent requests that can wait minutes to hours
- A unique custom_id for each request so you can match results
Step 1: Build the batch with custom ids
Each entry has a custom_id and the same params you would pass to a normal message call. Results come back in any order, so the custom_id is how you reconnect each answer to its input.
from anthropic import Anthropic
client = Anthropic()
batch = client.messages.batches.create(requests=[
{
"custom_id": f"ticket-{i}",
"params": {
"model": "claude-haiku-4-5",
"max_tokens": 64,
"messages": [{"role": "user", "content": f"Classify: {text}"}],
},
}
for i, text in enumerate(tickets)
])
print(batch.id, batch.processing_status)Step 2: Poll until it ends
Check processing_status on an interval until it reads ended. The request_counts field tells you how many succeeded or errored as it runs.
import time
while True:
b = client.messages.batches.retrieve(batch.id)
if b.processing_status == "ended":
break
print("still processing:", b.request_counts.processing)
time.sleep(60)Step 3: Read results keyed by custom_id
Stream the results and key them by custom_id. Never assume order. Handle each result type so a single failure does not lose the rest of the batch.
labels = {}
for r in client.messages.batches.results(batch.id):
if r.result.type == "succeeded":
labels[r.custom_id] = r.result.message.content[0].text
elif r.result.type == "errored":
labels[r.custom_id] = f"ERROR: {r.result.error.type}"
print(len(labels), "results collected")Result: 5,000 ticket classifications that would have cost full price as live calls ran as one batch at half the token price, finished in under an hour, and came back keyed by ticket id with zero errors.
Watch related tutorials
1:42:18
28:14
41:09
9:47
8:23
52:31