TroubleshootingIntermediate

How to halve Claude costs for bulk jobs with the Batch API

Send non-urgent work through the Message Batches API to get 50 percent off standard token prices.

8 minIntermediate

If you have thousands of requests that do not need an answer in the next second, sending them one at a time at full price is the expensive way to do it. The Message Batches API runs them asynchronously at half the standard token price. Most batches finish within an hour. This guide walks through creating a batch, polling it, and reading results.

  • The Anthropic SDK and an API key
  • A list of independent requests that can wait minutes to hours
  • A unique custom_id for each request so you can match results
When batching fits
Good fits: nightly classification, bulk summarization, dataset labeling, backfilling embeddings of text. Bad fits: a live chat reply or anything a user is waiting on.

Step 1: Build the batch with custom ids

Each entry has a custom_id and the same params you would pass to a normal message call. Results come back in any order, so the custom_id is how you reconnect each answer to its input.

create_batch.py
from anthropic import Anthropic

client = Anthropic()
batch = client.messages.batches.create(requests=[
    {
        "custom_id": f"ticket-{i}",
        "params": {
            "model": "claude-haiku-4-5",
            "max_tokens": 64,
            "messages": [{"role": "user", "content": f"Classify: {text}"}],
        },
    }
    for i, text in enumerate(tickets)
])
print(batch.id, batch.processing_status)

Step 2: Poll until it ends

Check processing_status on an interval until it reads ended. The request_counts field tells you how many succeeded or errored as it runs.

poll.py
import time

while True:
    b = client.messages.batches.retrieve(batch.id)
    if b.processing_status == "ended":
        break
    print("still processing:", b.request_counts.processing)
    time.sleep(60)
Terminal — batch progress
$ python poll.py
still processing: 4120
still processing: 1890
still processing: 0
ended succeeded: 5000 errored: 0
Poll on an interval; ended means results are ready to stream.

Step 3: Read results keyed by custom_id

Stream the results and key them by custom_id. Never assume order. Handle each result type so a single failure does not lose the rest of the batch.

results.py
labels = {}
for r in client.messages.batches.results(batch.id):
    if r.result.type == "succeeded":
        labels[r.custom_id] = r.result.message.content[0].text
    elif r.result.type == "errored":
        labels[r.custom_id] = f"ERROR: {r.result.error.type}"
print(len(labels), "results collected")
Stack the savings
Batch pricing is half off, and prompt caching still applies inside a batch. Pair a cached shared prefix with batching for the lowest possible cost on large jobs.

Result: 5,000 ticket classifications that would have cost full price as live calls ran as one batch at half the token price, finished in under an hour, and came back keyed by ticket id with zero errors.

Watch related tutorials

Tags
#cost#batch#claude#bulk