In this article
Claude API pricing (2026): per-token costs for Opus, Sonnet and Haiku, with real examples
Anthropic charges for the Claude API per million tokens, separately from any claude.ai subscription. As of May 2026verified 2026-05-30, the current models run Opus 4.8 at $5 input / $25 output, Sonnet 4.6 at $3 / $15, and Haiku 4.5 at $1 / $5 per million tokens. This guide gives you the full dated rate card, the discounts that actually move your bill (caching and batch), worked dollar math for real workloads, and an honest per-token comparison against OpenAI and Gemini. The numbers churn fast, so we date-stamp every figure and tell you where to re-check it.
What is Claude API pricing?
Claude API pricing is a usage-based rate card that bills you per token sent to and returned from a Claude model, with no subscription and no per-seat fee. A token is the unit of billing: you pay one rate for input tokens (everything you send, including the system prompt, conversation history, and any attached documents) and a separate, higher rate for output tokens (everything the model generates). Both are quoted per million tokens, abbreviated MTok.
This is a different product from the consumer plans. The $20/month Pro plan and the Free plan on claude.com cover the claude.ai chat app and Claude Code, billed as a flat subscription. The API is metered per token through the Claude Developer Platform console and has nothing to do with which subscription you carry. If you are choosing between consumer tiers instead, our Claude pricing plans guide breaks down Free, Pro, Max, Team, and Enterprise.
Three things drive every Claude API bill: which model you call, how many input and output tokens each call uses, and which discounts (caching, batching) you apply. The rest of this guide takes them in that order.
How much does each Claude model cost per million tokens?
Each current Claude model has a fixed per-million-token rate for input and a higher rate for output, and across the lineup output costs roughly five times input. The table below is the current-generation rate card, verified against Anthropic's published pricing on 2026-05-30verified 2026-05-30. This is the citable dataset: standard (non-cached, non-batch) rates per MTok.
| Model | Input / MTok | Output / MTok | Tier |
|---|---|---|---|
| Opus 4.8 (flagship) | $5 | $25 | Top reasoning |
| Opus 4.7 | $5 | $25 | Top reasoning |
| Opus 4.6 | $5 | $25 | Top reasoning |
| Opus 4.5 | $5 | $25 | Top reasoning |
| Sonnet 4.6 (current Sonnet) | $3 | $15 | Balanced |
| Sonnet 4.5 | $3 | $15 | Balanced |
| Haiku 4.5 (current Haiku) | $1 | $5 | Fast / cheap |
| Opus 4.1 (legacy) | $15 | $75 | Legacy Opus |
| Haiku 3.5 (legacy) | $0.80 | $4 | Legacy Haiku |
Every standard rate above is verified 2026-05-30verified 2026-05-30 against the Claude pricing documentation. Three things are worth flagging:
- The current Opus tier is much cheaper than the legacy one. Opus 4.5 through 4.8 all sit at $5 / $25 per MTok. The older Opus 4.1 still lists at $15 / $75, three times the current rate. If your code pins an old Opus version, you may be paying triple for no reason. (Our migration notes live below in the effective-cost section.)
- Haiku 3.5 is retired on the first-party API but remains available on Amazon Bedrock and Google Cloud Vertex AI at $0.80 / $4. For new builds, Haiku 4.5 at $1 / $5 is the supported current model.
- The flagship name moves. At the time of writing the top Opus resolves to Opus 4.8 at $5 / $25, but Anthropic ships new versions frequently. The $5 / $25 price is firmly sourced; if you need the exact current flagship name, confirm the top row on claude.com/pricing at the moment you build.
What are the 5 cost levers that reduce your Claude API bill?
Cost levers are the five techniques that change what you actually pay without changing which model you call, and stacked together they routinely cut a real bill by more than half. Sticker price is the starting point, not the final number. Here is each lever and the saving it delivers.
1. Prompt caching (up to ~90% off reused input)
Prompt caching stores a stable prefix of your prompt (system instructions, a schema, retrieved documents) so repeated calls pay a steep discount on that portion. A cache read (a hit) costs about 0.1x the base input rate, so Opus cached input drops from $5 to roughly $0.50 per MTokverified 2026-05-30, around a 90 percent saving on the cached tokens. You pay a one-time premium to write the cache: 1.25x for a 5-minute cache ($6.25/MTok on Opus) or 2x for a 1-hour cache ($10/MTok on Opus). Caching pays off the moment the same context is reused more than a couple of times, which is most chatbots, agents, and RAG systems. The prompt caching docs have the full mechanics.
2. The Batch API (flat 50% off)
The Batch API processes requests asynchronously within a 24-hour window for a flat 50 percent discount on both input and output. Opus 4.8 batch runs $2.50 / $12.50 per MTok, Sonnet 4.6 runs $1.50 / $7.50, and Haiku 4.5 runs $0.50 / $2.50. There is no quality trade-off; you only give up immediate response time. For evaluation runs, bulk classification, overnight enrichment, or any non-interactive job, batching halves the bill for free.
3. Model routing (pay flagship rates only when you must)
Routing sends each request to the cheapest model that can handle it. If 80 percent of your traffic is simple and can run on Haiku at $1 / $5, and only 20 percent needs Opus at $5 / $25, a router that classifies difficulty first means you pay the flagship rate on one request in five instead of all five. On a mixed workload this often beats every other lever, because the input-rate gap between Haiku and Opus is 5x.
4. Context management (stop paying for tokens you do not need)
Because you pay for every input token on every call, conversation history and bloated system prompts compound. Trimming history, summarizing old turns, and retrieving only the documents a request actually needs cuts input volume directly. Caching the stable part and trimming the variable part are complementary, not competing.
5. Data residency (a small premium, not a discount)
This lever moves cost the other way. Some current models support US-only inference through the inference_geo setting at a 1.1x multiplier. If compliance requires US-only processing, budget roughly 10 percent above the standard rate on the models that support it. It is worth naming because a residency requirement quietly raises the floor under all your other math.
How they compound: these stack multiplicatively. Take a RAG chatbot on Opus. Cache the retrieved documents and system prompt (lever 1) and most of your input cost drops ~90 percent. Route simple follow-ups to Haiku (lever 3) and a chunk of traffic leaves Opus entirely. Batch the nightly re-indexing (lever 2) and that job halves. The sticker rate of $5 / $25 ends up describing only the small, real-time, uncached slice of your usage.
What does the Claude API actually cost for real workloads?
Real-workload cost is the dollar figure you get when you multiply realistic token counts by the per-MTok rate, and it is usually far smaller than people fear. The docs give you rates; they do not give you dollars per workload. Here are three worked examples at May 2026 ratesverified 2026-05-30. Token counts are illustrative estimates, so treat the dollar figures as planning numbers, not invoices.
Example 1: a customer-support chatbot (Haiku 4.5)
Assume a support bot handling 50,000 conversations a month. Each conversation averages 2,000 input tokens (system prompt plus a few turns) and 400 output tokens. That is 100M input tokens and 20M output tokens per month.
- Input: 100 MTok x $1 = $100
- Output: 20 MTok x $5 = $100
- Total: ~$200/month on Haiku 4.5, before caching.
Cache the shared system prompt (say 1,200 of those 2,000 input tokens) and most of the input cost drops to the 0.1x read rate. Input falls toward $20 to $30, taking the total to roughly $120 to $130/month. The same workload on Opus 4.8 would be five times the input and output rate, around $1,000/month before caching, which is exactly why support bots run on Haiku.
Example 2: a RAG knowledge assistant (Sonnet 4.6)
Assume a documentation assistant doing 10,000 queries a month, each retrieving 8,000 tokens of context plus a 1,000-token system prompt (9,000 input) and producing 600 output tokens. That is 90M input and 6M output per month.
- Input: 90 MTok x $3 = $270
- Output: 6 MTok x $15 = $90
- Total: ~$360/month on Sonnet 4.6, before caching.
The retrieved context is the cost driver, and it is highly cacheable. With a 1-hour cache on the document set, repeated reads cost 0.1x input. In a workload where popular documents are queried many times, cached input can fall by half or more, taking the bill toward $180 to $220/month. This is the canonical case where caching changes the economics rather than shaving the edges.
Example 3: a coding agent (Opus 4.8)
Assume a coding agent running 1,000 substantial tasks a month, each consuming 40,000 input tokens (codebase context, files, tool results) and generating 8,000 output tokens. That is 40M input and 8M output per month.
- Input: 40 MTok x $5 = $200
- Output: 8 MTok x $25 = $200
- Total: ~$400/month on Opus 4.8, before caching.
Coding agents re-send large, stable context (the same files and instructions) across many tool-use turns, so caching is unusually effective here, often cutting input cost by well over half. If you are deciding between metered API access and a flat Claude Code subscription for this kind of work, our best AI coding assistants comparison covers when each pricing model wins.
Is the Claude API cheaper than OpenAI and Gemini?
On raw per-token sticker price, Claude is competitive but rarely the outright cheapest; it tends to win on output quality per dollar for coding and long-document work rather than on the number itself. Provider rate cards move constantly, so the honest answer is "compare the specific tiers you would actually use, on the date you build." Below is the Claude side of that comparison, with the structural points that matter more than any single cell.
| Claude model | Input / MTok | Output / MTok | Best-fit job |
|---|---|---|---|
| Haiku 4.5 | $1 | $5 | Classification, routing, extraction |
| Sonnet 4.6 | $3 | $15 | RAG, general assistants, balanced work |
| Opus 4.8 | $5 | $25 | Coding agents, hard reasoning, long docs |
All Claude rates above are verified 2026-05-30verified 2026-05-30. When you put this next to OpenAI's API pricing and Google's Gemini API pricing, three patterns hold more reliably than the individual numbers:
- The cheapest absolute tier is usually a small model from Google or OpenAI, not Claude. If your task is trivial classification at massive scale and quality is fungible, Claude is not where you minimize cost.
- In the mid and flagship tiers the three providers price within the same band. Sonnet at $3 / $15 and Opus at $5 / $25 are not outliers; comparable models from OpenAI and Google land near them. The per-token gap is rarely the deciding factor.
- Claude's edge is quality per dollar on hard tasks. On coding and long-context reasoning, Claude often produces a usable result in fewer iterations, which lowers effective cost even when the sticker rate is similar or slightly higher. That is the real comparison, and it is workload-specific.
For a feature-level breakdown of the chat products rather than the APIs, see our ChatGPT vs Claude vs Gemini comparison. The short version: pick the model on the task, not the provider on the brand.
Why does Claude Opus cost more than the rate card suggests?
Effective cost is the real per-task dollar figure once tokenizer behavior, output length, and version choice are factored in, and for Opus it can run above the sticker $5 / $25. The rate card is honest, but three things quietly inflate what a task costs.
The tokenizer changed. Opus 4.7 and later use a new tokenizer that may use up to roughly 35 percent more tokens for the same text. The per-token price held steady, but if each request now contains more tokens, the per-request cost rises accordingly. When you compare an Opus 4.6 estimate against an Opus 4.8 estimate, the price column looks identical while the real bill can be higher on the newer model for identical inputs. Budget a margin when migrating up.
Fast mode is a separate, higher rate. The flagship offers a fast mode (a research preview) priced well above standard, on the order of $10 input / $50 out for the current flagship and higher on some prior Opus versions. It is not the default, but if you enable it for latency, your effective rate doubles or more. Know which mode you are calling.
Version pinning can trap you on legacy pricing. Opus 4.1 still lists at $15 / $75, three times the current Opus tier. Code that pins a model string from a year ago can be paying legacy rates silently. The fix is migrating to a current model version and re-running your cost estimate against the new tokenizer.
Get the Claude Pricing and Plans Cheat Sheet (2026)
Every consumer plan, every API model rate, and the caching and batch discounts on one printable PDF. Updated each quarter as the rate card moves.
Bottom line: who the Claude API is and is not worth it for
The Claude API is a metered, per-token service that is worth it when you are building a product on top of a model and want strong reasoning per dollar, and not worth it when a flat subscription or a cheaper provider tier would do the same job. The pricing is transparent: Opus 4.8 at $5 / $25, Sonnet 4.6 at $3 / $15, and Haiku 4.5 at $1 / $5 per million tokens, with output roughly 5x input across the board.
Worth it for: developers shipping chatbots, RAG assistants, and coding agents who will use caching and batching, and who value Claude's coding and long-document quality. At realistic token counts, a serious workload often runs in the low hundreds of dollars per month, not the thousands people fear.
Not worth it for: anyone who just wants to chat with Claude (use the $20/month Pro plan instead, the API is more expensive and more work), and anyone running massive trivial-classification volume where a cheaper small model from another provider wins on raw price.
If you are running these costs as part of a business, API spend is a deductible software expense; our network covers self-employed tax deductions including the AI tools you build on. And whichever model you land on, re-check the live rate card before you budget: at the time of writing, the figures here are current as of 2026-05-30verified 2026-05-30, but Anthropic ships new versions often. Confirm at claude.com/pricing and the pricing docs.