How to choose the right Claude model for cost and quality
Match Haiku, Sonnet, Opus, or Fable to your task so you neither overpay nor underpower the job.
Reaching for the most expensive model on every task wastes money, and reaching for the cheapest on a hard task wastes time on bad answers. This guide gives a simple way to pick a Claude model by matching the task to the right tier, then verifying with a quick test.
- A clear description of the task you are automating
- The Anthropic SDK and an API key for a quick test call
- A rough sense of your monthly request volume
Step 1: Learn the current lineup and prices
Prices are per million tokens, input and output. Cheaper models are faster and fine for simple, high volume work. More capable models cost more but handle ambiguity and long agentic tasks.
| Model | Input $/1M | Output $/1M | Best for |
|---|---|---|---|
| claude-haiku-4-5 | 1.00 | 5.00 | Simple, high-volume, latency-sensitive |
| claude-sonnet-4-6 | 3.00 | 15.00 | Balanced speed and intelligence |
| claude-opus-4-8 | 5.00 | 25.00 | Hard reasoning, long agentic work |
| claude-fable-5 | 10.00 | 50.00 | The most demanding reasoning |
Step 2: Classify your task
Sort the job into one bucket. Classification, tagging, short extraction, and simple chat go to Haiku. Most everyday generation, summarization, and tool use go to Sonnet. Multi-step coding, deep research, and tasks where a wrong answer is costly go to Opus. Reserve Fable for the genuinely hardest, long-horizon work where its higher price is justified.
Step 3: Run the same prompt on two tiers
Do not pick from the table alone. Run your real prompt on the cheaper candidate and the next tier up, then compare. If the cheaper one is good enough, you just cut your cost. Default to the cheaper model only when its output passes your bar.
from anthropic import Anthropic
client = Anthropic()
prompt = "Classify this ticket as billing, bug, or feature: 'app crashes on login'"
for model in ["claude-haiku-4-5", "claude-sonnet-4-6"]:
r = client.messages.create(
model=model, max_tokens=64,
messages=[{"role": "user", "content": prompt}],
)
print(model, "->", r.content[0].text)Result: the ticket classifier produced identical labels on Haiku and Sonnet, so it stayed on Haiku at one fifth the input cost. The refactor task was visibly weaker on Sonnet and moved to Opus 4.8, where the diff was correct on the first pass.
Watch related tutorials
1:42:18
28:14
41:09
9:47
8:23
52:31