TroubleshootingBeginner

How to choose the right Claude model for cost and quality

Match Haiku, Sonnet, Opus, or Fable to your task so you neither overpay nor underpower the job.

7 minBeginner

Reaching for the most expensive model on every task wastes money, and reaching for the cheapest on a hard task wastes time on bad answers. This guide gives a simple way to pick a Claude model by matching the task to the right tier, then verifying with a quick test.

A clear description of the task you are automating
The Anthropic SDK and an API key for a quick test call
A rough sense of your monthly request volume

Step 1: Learn the current lineup and prices

Prices are per million tokens, input and output. Cheaper models are faster and fine for simple, high volume work. More capable models cost more but handle ambiguity and long agentic tasks.

Model	Input $/1M	Output $/1M	Best for
claude-haiku-4-5	1.00	5.00	Simple, high-volume, latency-sensitive
claude-sonnet-4-6	3.00	15.00	Balanced speed and intelligence
claude-opus-4-8	5.00	25.00	Hard reasoning, long agentic work
claude-fable-5	10.00	50.00	The most demanding reasoning

Step 2: Classify your task

Sort the job into one bucket. Classification, tagging, short extraction, and simple chat go to Haiku. Most everyday generation, summarization, and tool use go to Sonnet. Multi-step coding, deep research, and tasks where a wrong answer is costly go to Opus. Reserve Fable for the genuinely hardest, long-horizon work where its higher price is justified.

Notes — task to model map

tag support tickets by topic -> haiku-4-5

draft weekly newsletter -> sonnet-4-6

refactor a module across files -> opus-4-8

multi-day autonomous research -> fable-5 (only if needed)

Write your task on the left and the bucket it falls into on the right.

Step 3: Run the same prompt on two tiers

Do not pick from the table alone. Run your real prompt on the cheaper candidate and the next tier up, then compare. If the cheaper one is good enough, you just cut your cost. Default to the cheaper model only when its output passes your bar.

compare.py

from anthropic import Anthropic

client = Anthropic()
prompt = "Classify this ticket as billing, bug, or feature: 'app crashes on login'"

for model in ["claude-haiku-4-5", "claude-sonnet-4-6"]:
    r = client.messages.create(
        model=model, max_tokens=64,
        messages=[{"role": "user", "content": prompt}],
    )
    print(model, "->", r.content[0].text)

Set effort instead of upgrading

On Opus 4.6 and later and Sonnet 4.6 you can raise output_config.effort to low, medium, high, xhigh, or max. Often a high effort Sonnet beats a low effort Opus at lower cost. Tune effort before you jump a tier.

Result: the ticket classifier produced identical labels on Haiku and Sonnet, so it stayed on Haiku at one fifth the input cost. The refactor task was visibly weaker on Sonnet and moved to Opus 4.8, where the diff was correct on the first pass.