Researched by Vincent Wesley CoueyMay 2026 · 13 min read
In this article
  1. What is Claude API pricing?
  2. Per-million-token rate table
  3. The 5 cost levers
  4. Real-workload cost examples
  5. Claude vs OpenAI and Gemini
  6. Effective cost vs sticker cost
  7. Bottom line
  8. FAQ
Last reviewed: May 2026 Next review: August 2026

Claude API pricing (2026): per-token costs for Opus, Sonnet and Haiku, with real examples

Anthropic charges for the Claude API per million tokens, separately from any claude.ai subscription. As of May 2026verified 2026-05-30, the current models run Opus 4.8 at $5 input / $25 output, Sonnet 4.6 at $3 / $15, and Haiku 4.5 at $1 / $5 per million tokens. This guide gives you the full dated rate card, the discounts that actually move your bill (caching and batch), worked dollar math for real workloads, and an honest per-token comparison against OpenAI and Gemini. The numbers churn fast, so we date-stamp every figure and tell you where to re-check it.

Claude API price per million tokens (2026) Claude API price per million tokens (2026) Input Output $0 $5 $10 $15 $20 $25 USD per million tokens $5 $25 Opus 4.8 $3 $15 Sonnet 4.6 $1 $5 Haiku 4.5
Standard API rate per million tokens, input versus output, for the current Claude models. verified 2026-05-30
Disclosure: Nesyona is reader-supported and runs display ads. Anthropic, OpenAI, and Google do not run affiliate programs for their APIs, so every vendor link here is a plain primary-source link with no tracking or commission. Prices are quoted as of the dates shown; always confirm live rates before you budget. Full policy.
In this guide
What is Claude API pricing? How much does each Claude model cost per million tokens? What are the 5 cost levers that reduce your bill? What does the Claude API actually cost for real workloads? Is the Claude API cheaper than OpenAI and Gemini? Why does Opus cost more than the rate card suggests? Bottom line: who the Claude API is and is not worth it for

What is Claude API pricing?

Claude API pricing is a usage-based rate card that bills you per token sent to and returned from a Claude model, with no subscription and no per-seat fee. A token is the unit of billing: you pay one rate for input tokens (everything you send, including the system prompt, conversation history, and any attached documents) and a separate, higher rate for output tokens (everything the model generates). Both are quoted per million tokens, abbreviated MTok.

This is a different product from the consumer plans. The $20/month Pro plan and the Free plan on claude.com cover the claude.ai chat app and Claude Code, billed as a flat subscription. The API is metered per token through the Claude Developer Platform console and has nothing to do with which subscription you carry. If you are choosing between consumer tiers instead, our Claude pricing plans guide breaks down Free, Pro, Max, Team, and Enterprise.

Three things drive every Claude API bill: which model you call, how many input and output tokens each call uses, and which discounts (caching, batching) you apply. The rest of this guide takes them in that order.

How much does each Claude model cost per million tokens?

Each current Claude model has a fixed per-million-token rate for input and a higher rate for output, and across the lineup output costs roughly five times input. The table below is the current-generation rate card, verified against Anthropic's published pricing on 2026-05-30verified 2026-05-30. This is the citable dataset: standard (non-cached, non-batch) rates per MTok.

ModelInput / MTokOutput / MTokTier
Opus 4.8 (flagship)$5$25Top reasoning
Opus 4.7$5$25Top reasoning
Opus 4.6$5$25Top reasoning
Opus 4.5$5$25Top reasoning
Sonnet 4.6 (current Sonnet)$3$15Balanced
Sonnet 4.5$3$15Balanced
Haiku 4.5 (current Haiku)$1$5Fast / cheap
Opus 4.1 (legacy)$15$75Legacy Opus
Haiku 3.5 (legacy)$0.80$4Legacy Haiku

Every standard rate above is verified 2026-05-30verified 2026-05-30 against the Claude pricing documentation. Three things are worth flagging:

What are the 5 cost levers that reduce your Claude API bill?

Cost levers are the five techniques that change what you actually pay without changing which model you call, and stacked together they routinely cut a real bill by more than half. Sticker price is the starting point, not the final number. Here is each lever and the saving it delivers.

1. Prompt caching (up to ~90% off reused input)

Prompt caching stores a stable prefix of your prompt (system instructions, a schema, retrieved documents) so repeated calls pay a steep discount on that portion. A cache read (a hit) costs about 0.1x the base input rate, so Opus cached input drops from $5 to roughly $0.50 per MTokverified 2026-05-30, around a 90 percent saving on the cached tokens. You pay a one-time premium to write the cache: 1.25x for a 5-minute cache ($6.25/MTok on Opus) or 2x for a 1-hour cache ($10/MTok on Opus). Caching pays off the moment the same context is reused more than a couple of times, which is most chatbots, agents, and RAG systems. The prompt caching docs have the full mechanics.

2. The Batch API (flat 50% off)

The Batch API processes requests asynchronously within a 24-hour window for a flat 50 percent discount on both input and output. Opus 4.8 batch runs $2.50 / $12.50 per MTok, Sonnet 4.6 runs $1.50 / $7.50, and Haiku 4.5 runs $0.50 / $2.50. There is no quality trade-off; you only give up immediate response time. For evaluation runs, bulk classification, overnight enrichment, or any non-interactive job, batching halves the bill for free.

3. Model routing (pay flagship rates only when you must)

Routing sends each request to the cheapest model that can handle it. If 80 percent of your traffic is simple and can run on Haiku at $1 / $5, and only 20 percent needs Opus at $5 / $25, a router that classifies difficulty first means you pay the flagship rate on one request in five instead of all five. On a mixed workload this often beats every other lever, because the input-rate gap between Haiku and Opus is 5x.

4. Context management (stop paying for tokens you do not need)

Because you pay for every input token on every call, conversation history and bloated system prompts compound. Trimming history, summarizing old turns, and retrieving only the documents a request actually needs cuts input volume directly. Caching the stable part and trimming the variable part are complementary, not competing.

5. Data residency (a small premium, not a discount)

This lever moves cost the other way. Some current models support US-only inference through the inference_geo setting at a 1.1x multiplier. If compliance requires US-only processing, budget roughly 10 percent above the standard rate on the models that support it. It is worth naming because a residency requirement quietly raises the floor under all your other math.

How they compound: these stack multiplicatively. Take a RAG chatbot on Opus. Cache the retrieved documents and system prompt (lever 1) and most of your input cost drops ~90 percent. Route simple follow-ups to Haiku (lever 3) and a chunk of traffic leaves Opus entirely. Batch the nightly re-indexing (lever 2) and that job halves. The sticker rate of $5 / $25 ends up describing only the small, real-time, uncached slice of your usage.

What does the Claude API actually cost for real workloads?

Real-workload cost is the dollar figure you get when you multiply realistic token counts by the per-MTok rate, and it is usually far smaller than people fear. The docs give you rates; they do not give you dollars per workload. Here are three worked examples at May 2026 ratesverified 2026-05-30. Token counts are illustrative estimates, so treat the dollar figures as planning numbers, not invoices.

Example 1: a customer-support chatbot (Haiku 4.5)

Assume a support bot handling 50,000 conversations a month. Each conversation averages 2,000 input tokens (system prompt plus a few turns) and 400 output tokens. That is 100M input tokens and 20M output tokens per month.

Cache the shared system prompt (say 1,200 of those 2,000 input tokens) and most of the input cost drops to the 0.1x read rate. Input falls toward $20 to $30, taking the total to roughly $120 to $130/month. The same workload on Opus 4.8 would be five times the input and output rate, around $1,000/month before caching, which is exactly why support bots run on Haiku.

Example 2: a RAG knowledge assistant (Sonnet 4.6)

Assume a documentation assistant doing 10,000 queries a month, each retrieving 8,000 tokens of context plus a 1,000-token system prompt (9,000 input) and producing 600 output tokens. That is 90M input and 6M output per month.

The retrieved context is the cost driver, and it is highly cacheable. With a 1-hour cache on the document set, repeated reads cost 0.1x input. In a workload where popular documents are queried many times, cached input can fall by half or more, taking the bill toward $180 to $220/month. This is the canonical case where caching changes the economics rather than shaving the edges.

Example 3: a coding agent (Opus 4.8)

Assume a coding agent running 1,000 substantial tasks a month, each consuming 40,000 input tokens (codebase context, files, tool results) and generating 8,000 output tokens. That is 40M input and 8M output per month.

Coding agents re-send large, stable context (the same files and instructions) across many tool-use turns, so caching is unusually effective here, often cutting input cost by well over half. If you are deciding between metered API access and a flat Claude Code subscription for this kind of work, our best AI coding assistants comparison covers when each pricing model wins.

The output-token rule of thumb Across every example, output tokens cost five times input tokens. If a bill looks high, the fastest lever after caching is usually trimming output length: capping max tokens, asking for terse responses, and avoiding "explain your reasoning" on tasks that do not need it.

Is the Claude API cheaper than OpenAI and Gemini?

On raw per-token sticker price, Claude is competitive but rarely the outright cheapest; it tends to win on output quality per dollar for coding and long-document work rather than on the number itself. Provider rate cards move constantly, so the honest answer is "compare the specific tiers you would actually use, on the date you build." Below is the Claude side of that comparison, with the structural points that matter more than any single cell.

Claude modelInput / MTokOutput / MTokBest-fit job
Haiku 4.5$1$5Classification, routing, extraction
Sonnet 4.6$3$15RAG, general assistants, balanced work
Opus 4.8$5$25Coding agents, hard reasoning, long docs

All Claude rates above are verified 2026-05-30verified 2026-05-30. When you put this next to OpenAI's API pricing and Google's Gemini API pricing, three patterns hold more reliably than the individual numbers:

For a feature-level breakdown of the chat products rather than the APIs, see our ChatGPT vs Claude vs Gemini comparison. The short version: pick the model on the task, not the provider on the brand.

Why does Claude Opus cost more than the rate card suggests?

Effective cost is the real per-task dollar figure once tokenizer behavior, output length, and version choice are factored in, and for Opus it can run above the sticker $5 / $25. The rate card is honest, but three things quietly inflate what a task costs.

The tokenizer changed. Opus 4.7 and later use a new tokenizer that may use up to roughly 35 percent more tokens for the same text. The per-token price held steady, but if each request now contains more tokens, the per-request cost rises accordingly. When you compare an Opus 4.6 estimate against an Opus 4.8 estimate, the price column looks identical while the real bill can be higher on the newer model for identical inputs. Budget a margin when migrating up.

Fast mode is a separate, higher rate. The flagship offers a fast mode (a research preview) priced well above standard, on the order of $10 input / $50 out for the current flagship and higher on some prior Opus versions. It is not the default, but if you enable it for latency, your effective rate doubles or more. Know which mode you are calling.

Version pinning can trap you on legacy pricing. Opus 4.1 still lists at $15 / $75, three times the current Opus tier. Code that pins a model string from a year ago can be paying legacy rates silently. The fix is migrating to a current model version and re-running your cost estimate against the new tokenizer.

The honest read Opus is worth its premium when a task genuinely needs top-tier reasoning and a correct first answer saves you iterations. It is not worth it for bulk work a router could send to Haiku, and the tokenizer change means you should re-estimate, not assume parity, whenever you move up a version.

Get the Claude Pricing and Plans Cheat Sheet (2026)

Every consumer plan, every API model rate, and the caching and batch discounts on one printable PDF. Updated each quarter as the rate card moves.

Bottom line: who the Claude API is and is not worth it for

The Claude API is a metered, per-token service that is worth it when you are building a product on top of a model and want strong reasoning per dollar, and not worth it when a flat subscription or a cheaper provider tier would do the same job. The pricing is transparent: Opus 4.8 at $5 / $25, Sonnet 4.6 at $3 / $15, and Haiku 4.5 at $1 / $5 per million tokens, with output roughly 5x input across the board.

Worth it for: developers shipping chatbots, RAG assistants, and coding agents who will use caching and batching, and who value Claude's coding and long-document quality. At realistic token counts, a serious workload often runs in the low hundreds of dollars per month, not the thousands people fear.

Not worth it for: anyone who just wants to chat with Claude (use the $20/month Pro plan instead, the API is more expensive and more work), and anyone running massive trivial-classification volume where a cheaper small model from another provider wins on raw price.

If you are running these costs as part of a business, API spend is a deductible software expense; our network covers self-employed tax deductions including the AI tools you build on. And whichever model you land on, re-check the live rate card before you budget: at the time of writing, the figures here are current as of 2026-05-30verified 2026-05-30, but Anthropic ships new versions often. Confirm at claude.com/pricing and the pricing docs.

Frequently asked questions about Anthropic Claude API pricing

How much does the Claude API cost per token?

Pricing is quoted per million tokens (MTok). As of May 2026, Opus 4.8 costs $5 input / $25 output, Sonnet 4.6 costs $3 / $15, and Haiku 4.5 costs $1 / $5 per million tokens. Output is roughly five times input across the lineup. Confirm current rates at platform.claude.com/docs.

Which Claude model is cheapest for production use?

Haiku 4.5 at $1 input / $5 output per MTok is the cheapest current Claude model, five times cheaper than Opus on input. Use it for classification, extraction, and routing, and only move up to Sonnet or Opus on the requests that genuinely need stronger reasoning.

Is there a free Claude API tier?

There is no ongoing free API tier. New accounts get a small one-time amount of free credits to test the API; after that every request is billed per token. The $0 Free plan on claude.ai is a separate consumer chat product and does not include API access.

How does prompt caching reduce Claude API costs?

Caching stores a reused prefix so repeat calls pay about 0.1x the input rate on the cached tokens, roughly a 90 percent saving. Opus cached input drops from $5 to about $0.50 per MTok. You pay a one-time write premium (1.25x for a 5-minute cache, 2x for a 1-hour cache), so caching pays off when context is reused several times.

How much does the Anthropic Batch API save?

The Batch API gives a flat 50 percent discount on input and output for requests processed within 24 hours. Opus 4.8 batch is $2.50 / $12.50, Sonnet 4.6 is $1.50 / $7.50, and Haiku 4.5 is $0.50 / $2.50 per MTok. There is no quality trade-off, only a delay, so it suits any non-interactive workload.

Is the Claude API cheaper than OpenAI and Gemini?

It depends on the tier. Haiku 4.5 at $1 / $5 is competitive but rarely the absolute cheapest small model. Sonnet at $3 / $15 and Opus at $5 / $25 sit in the same band as comparable mid-tier and flagship models from OpenAI and Google. Claude tends to win on coding and long-document quality per dollar rather than on raw sticker price.

Does data residency (US-only inference) affect Claude API pricing?

Yes, slightly. Some current models support US-only inference via the inference_geo setting at a 1.1x multiplier, so budget roughly 10 percent above the standard rate on supported models when compliance requires US-only processing.

Save
Dashboard

From our network

Best AI Tools for Amazon Sellers - bagengine.comBest AI Courses 2026 - edubracket.comBest Accounting Software for Online Sellers - ceocult.com