TroubleshootingBeginner

How to count tokens before sending a prompt to Claude

Use the count_tokens endpoint to measure a prompt accurately so you never guess at size or cost.

5 minBeginner

Guessing token counts leads to surprise overflows and surprise bills. The Anthropic API has a dedicated count_tokens endpoint that returns the exact input token count for a given model. It costs nothing to call and takes one request. This guide shows how to count a string, a file, and the difference between two versions of a file.

The Anthropic SDK for Python or Node, or the ant CLI
An ANTHROPIC_API_KEY in your environment
The text, file, or messages you want to measure

Do not use tiktoken

tiktoken is OpenAI's tokenizer. It undercounts Claude tokens by roughly 15 to 20 percent on plain text and far more on code. Any number from tiktoken or gpt-tokenizer is wrong for Claude.

Step 1: Count a single string

Pass the text as a user message and read input_tokens off the response. Always pass the same model id you intend to call, because counts are model specific.

tokens.py

from anthropic import Anthropic

client = Anthropic()
resp = client.messages.count_tokens(
    model="claude-opus-4-8",
    messages=[{"role": "user", "content": "How many tokens is this sentence?"}],
)
print(resp.input_tokens)

Terminal — count result

$ python tokens.py

input_tokens is the only number you need here.

Step 2: Count a whole file

Read the file into the content field. This is the standard way to check whether a document will fit before you commit to a full generation call.

tokens_file.py

resp = client.messages.count_tokens(
    model="claude-opus-4-8",
    messages=[{"role": "user", "content": open("CLAUDE.md").read()}],
)
print(resp.input_tokens)

Step 3: Count from the command line

If you prefer the terminal, the ant CLI exposes the same endpoint. The @ prefix inlines a file into the content field, and --transform pulls out just the number.

zsh - tokens

$ant messages count-tokens --model claude-opus-4-8 \

$ --message '{role: user, content: "@./CLAUDE.md"}' \

$ --transform input_tokens -r

18432

Step 4: Turn tokens into a cost estimate

Once you have the input count, multiply by the model's input rate. Opus 4.8 is 5 dollars per million input tokens. So 18,432 input tokens costs about 0.09 dollars on the way in, before you add the output you will generate.

estimate.py

input_tokens = 18432
cost = input_tokens / 1_000_000 * 5.00   # Opus 4.8 input rate
print(f"about ${cost:.4f} for input")

Result: the file measured 18,432 tokens against Opus 4.8, comfortably inside the 1M window, at roughly nine cents of input cost. You now know the request will fit and what it will cost before sending a single generation call.