Methodology
This study tracks list price for input and output tokens across 32 priced versions of 12 frontier model families, from GPT-3.5 Turbo (March 2023) through GPT-5, Claude 4, Gemini 2.0 Flash, Llama 3.1 405B, DeepSeek V3, and Mistral Large 2 (through May 2026). Inclusion criteria:
- Public API access at any point during the 2022-11 to 2026-05 window
- Documented per-token list price in USD on the standard tier (not batch, not cached, not fine-tuned, not enterprise contract)
- At least one frontier-mini, frontier-mid, or frontier-flagship release in the family
For each version we recorded input price per million tokens, output price per million tokens, the effective date the price went live (not the announcement date), the deprecation or successor date, the listed context window, and the primary source URL. Where the live provider page no longer reflects the historical price, we cited the closest archive.org Wayback Machine snapshot, included as archive_url in the open dataset.
The cost-per-reference-task normalizes prices to a fixed prompt shape: 1,000 input tokens plus 500 output tokens, computed as (input/1e6 * 1000) + (output/1e6 * 500). The shape is chosen to reflect a typical single-turn agent task (a moderate prompt with a half-page response) and to keep the time-series comparable across models with different input-output spreads. Other prompt shapes will rank models differently; the 1k/500 shape is published alongside the raw per-token prices so any reader can recompute under their own assumptions.
Open dataset and per-version aggregates are at data.json under a CC-BY 4.0 license. For broader context on AI tool pricing across the consumer and prosumer layers, see our companion study AI Tools Statistics 2026.
Limitations and exclusions: Enterprise discounts and committed-use pricing are opaque and excluded; real spend at scale is typically 20-50% below list. Batch API pricing (often 50% off list) is excluded for comparability. Prompt caching discounts (now offered by OpenAI, Anthropic, Google, and DeepSeek) are excluded; including caching would push the effective floor lower. Fine-tuned model inference is excluded. Vision, audio, and embedding pricing are excluded; only text input and output tokens are tracked. Llama and Mistral pricing reflects hosted endpoints (Together, Fireworks); self-hosted economics differ. The reference-task framing is a single shape and would invert the ranking on output-heavy or input-heavy workloads.
Finding 1: Roughly 1,000x cost decay on the frontier-mini tier
Headline: $0.09 per task on GPT-4 32K (Mar 2023) compressed to $0.000235 per task on Gemini 1.5 Flash (May 2024), a 383x cut in 14 months
The frontier-mini cost-per-task fell roughly 1,000x when measured peak-to-floor across the full 32-version dataset. The trajectory is not smooth; instead it is a staircase of step-changes triggered by each new generation. GPT-3.5 Turbo set the first floor at $0.0025 per task in March 2023. The next floor came from Claude 3 Haiku at $0.000875 in March 2024. Gemini 1.5 Flash dropped that to $0.000225 in May 2024. GPT-4o mini matched it at $0.00045 in July 2024. Gemini 2.0 Flash held the floor through Feb 2025. The Stanford HAI 2024 AI Index report ("Cost of inference"), Epoch AI's compute trends work, and the Stat-of-AI annual report all corroborate the order-of-magnitude shape, but no public dataset has shipped the per-version raw series with primary citations until this study.
Finding 2: Claude Opus held its launch price for 26 months across two model generations
Headline: $15 input / $75 output per million from Mar 2024 (Opus 3) through May 2026 (Opus 4)
Anthropic priced Claude 3 Opus at $15 input and $75 output per million tokens at launch on March 4, 2024, and priced Claude 4 Opus at the identical $15/$75 on May 22, 2025. Across the same window, Sonnet pricing held flat at $3/$15, and Haiku ladders down with each generation: Claude 3 Haiku at $0.25/$1.25 (March 2024), Claude 3.5 Haiku at $0.80/$4 (November 2024), Claude 4 Haiku at $1/$5 (December 2025). The strategic read: Anthropic chose to ship a faster, smarter model at the same price rather than cut price on the flagship, while letting the price-sensitive lower tiers absorb the competitive pressure. This is the opposite of OpenAI's pattern, where each GPT-4 successor cut flagship list price (GPT-4 to GPT-4 Turbo to GPT-4o).
Finding 3: Gemini Flash sets the price floor every release
Headline: 1.5 Flash at $0.075 input / $0.30 output (May 2024); 2.0 Flash at $0.10 / $0.40 (Feb 2025)
Gemini Flash undercuts the closed-weight competitive set at every launch. At the May 2024 1.5 Flash launch, the next cheapest closed-weight option was Claude 3 Haiku at $0.25 input, 3.3x more expensive. By the GPT-4o mini launch two months later at $0.15 input, Gemini was still 2x cheaper. Gemini 2.0 Flash in Feb 2025 held the floor at $0.10 input. The undercut is structural rather than promotional: Google is the only frontier-model vendor that owns its full TPU stack and can amortize the silicon across consumer Search and Workspace AI workloads, which lets Flash list price approach marginal cost rather than including a sales-cycle margin. This pattern shows up in every Stanford HAI AI Index chapter on inference cost since 2024.
Finding 4: Open-weight hosted pricing leads each closed-weight cost cut by roughly one quarter
Headline: DeepSeek V2 at $0.14/M input (May 2024) preceded GPT-4o mini at $0.15/M (Jul 2024) by 73 days
DeepSeek V2 listed input pricing at $0.14 per million on May 6, 2024, two months before GPT-4o mini launched at $0.15 per million on July 18, 2024. Llama 3 70B on Together had been at $0.90/$0.90 since April 2024. The pattern: open-weight hosted endpoints establish the price floor that closed-weight providers then match at the next product launch. The mechanism is competitive rather than coincidental. Together AI, Fireworks, and DeepSeek itself publish API pricing the same week the model weights become public, which establishes a marginal-cost reference rate (compute plus modest hosting margin). Closed-weight providers then have one quarter to ship a competitive frontier-mini at or near the open-weight rate before defection accelerates. Per Meta's Llama 3.1 announcement and the corresponding Together pricing page, Llama 3.1 405B priced at $3.50/$3.50 in July 2024 sat between GPT-4 Turbo ($10/$30) and the next mini cycle.
Finding 5: Output is now 8x more expensive than input on the GPT-5 tier
Headline: GPT-3.5 was 1:1 input-output; GPT-5 is 1:8
The input-output spread widened from 1:1 on GPT-3.5 (March 2023) to 1:8 on GPT-5 (August 2025). Claude consistently runs 1:5 across all tiers (Opus, Sonnet, Haiku), Gemini Flash runs 1:4, and DeepSeek V3 runs 1:4. The widening spread reflects compute economics: input prefill is parallelizable across the prompt, while output generation is autoregressive and sequential. As context windows grew from 4K (GPT-3.5) to 1M (Gemini 1.5) and 400K (GPT-5), prefill economics improved faster than generation economics. The pricing implication for builders: output-heavy workloads (writing, reasoning, code generation) face a smaller cost cut than input-heavy workloads (RAG, summarization, classification). The same 1,000x decay headline does not apply to a coding agent generating 5K tokens of code per turn.
Finding 6: Intelligence-per-dollar improved roughly 300x in 23 months on the frontier-mini tier
Headline: $/MMLU-point fell from ~$0.0008 (GPT-3.5 Mar 23, MMLU 70) to ~$0.0000027 (Gemini 2.0 Flash Feb 25, MMLU 87)
Cost-per-task is only half the story; the other half is capability per dollar. Using MMLU as the most-published benchmark and dividing cost-per-reference-task by MMLU score gives a rough intelligence-per-dollar proxy. The frontier-mini intelligence-per-dollar improved from $0.0008/MMLU-point on GPT-3.5 Turbo to roughly $0.0000027 on Gemini 2.0 Flash, about 300x in 23 months. The flagship tier moved less dramatically: GPT-4 Mar 2023 at $0.06 per task and MMLU 86 gives $0.0007/point, while GPT-5 Aug 2025 at $0.00625 per task and an estimated MMLU 91 gives roughly $0.000069/point, about a 10x improvement on the flagship tier across the same 29 months. Intelligence-per-dollar is decaying fastest at the bottom of the price ladder, not the top, which is consistent with the Gemini Flash and DeepSeek pattern in Findings 3 and 4. MMLU scores cited from Stanford HAI AI Index 2025, Hendrycks et al. 2020 (MMLU), and provider technical reports.
Finding 7: GPT-4 list price compressed 12x in 17 months
Headline: $30/$60 (Mar 2023 GPT-4 8K) to $2.50/$10 (Aug 2024 GPT-4o); 12x input, 6x output
OpenAI cut GPT-4 generation flagship list price four times in 17 months. The original GPT-4 launched March 2023 at $30 input / $60 output per million on the 8K context, with $60/$120 on the 32K variant. GPT-4 Turbo in November 2023 cut to $10/$30, a 3x input compression. GPT-4o in May 2024 cut to $5/$15. The August 2024 GPT-4o refresh halved input again to $2.50 with output flat at $10. Across the same 17-month window, OpenAI also launched GPT-4o mini at $0.15/$0.60, a 200x cut from original GPT-4 input pricing on the small-tier. The pace is roughly one cut every four months on the flagship and two new mini tiers in between, which sets the expectation cadence the rest of the market now competes against.
Finding 8: DeepSeek V3 raised price over V2; the only frontier model that did
Headline: V2 input $0.14, V3 input $0.27 (1.9x); V2 output $0.28, V3 output $1.10 (3.9x)
Across all 32 versions tracked, only one frontier model raised price on its successor: DeepSeek V3 priced at $0.27 input and $1.10 output per million on December 26, 2024, up from V2's $0.14/$0.28 in May 2024. The mechanism is straightforward: V3 is a substantially larger MoE model (671B total, 37B active) with materially higher capability than V2, and DeepSeek captured some of that capability gain in price. Even after the increase, V3 sits below GPT-4o mini's $0.15/$0.60 only on input and below Claude 3.5 Haiku's $0.80/$4 on both axes. This does not break the broader decay pattern; instead it confirms that within a vendor's own ladder, capability-per-dollar improves but absolute price can rise when the model class moves up. Reported in DeepSeek's pricing documentation and the V3 technical paper.
Finding 9: Context window pricing is no longer separately tiered for most providers
Headline: GPT-4 charged 2x for 32K vs 8K in 2023; GPT-4o offers 128K at one rate
In March 2023, OpenAI charged $60/$120 for the 32K variant of GPT-4 versus $30/$60 for the 8K version, a clean 2x premium for context. By May 2024 GPT-4o offered 128K at a single price tier; Anthropic priced the full 200K Claude 3 family at flat per-token rates; Google priced 1M-context Gemini 1.5 Pro at flat rates with a small step at over 128K input. Only Gemini retains a context-window pricing tier (the over-128K input surcharge), and DeepSeek removed its V2 over-32K premium in V3. The economic implication: builders no longer pay a context-shape penalty, and prompt-stuffing strategies (long system prompts, RAG context injection) became viable at flat rates. This compounds the cost-per-task decay because the same 1,000-input-token reference shape now sits well within every model's free-context band.
Finding 10: Claude 3.5 Haiku raised input price 3.2x over Claude 3 Haiku
Headline: $0.25 input on Claude 3 Haiku; $0.80 on Claude 3.5 Haiku
The Haiku tier is the only Anthropic tier that has moved on price across generations, and it moved up. Claude 3 Haiku launched March 2024 at $0.25 input / $1.25 output; Claude 3.5 Haiku in November 2024 priced at $0.80 input / $4 output, a 3.2x raise on both axes. Claude 4 Haiku in December 2025 settled at $1/$5, a smaller step. The pattern matches DeepSeek V3 and the GPT-4 to GPT-4o input-price lift on early GPT-4o (the May $5 input was higher than the August $2.50 refresh), confirming that within-vendor mini-tier price is non-monotonic when the underlying model class steps up substantially. The Anthropic Haiku case is conspicuous because Sonnet and Opus stayed flat across the same generations.
Dataset preview
First 12 of 32 priced versions in the dataset; the full dataset is at data.json under CC-BY 4.0:
| Model | Provider | In $/M | Out $/M | Task $ | Effective |
|---|---|---|---|---|---|
| GPT-3.5 Turbo (0301) | OpenAI | 2.00 | 2.00 | 0.0030 | 2023-03-01 |
| GPT-4 (32K) | OpenAI | 60.00 | 120.00 | 0.1200 | 2023-03-14 |
| Claude 1.3 | Anthropic | 11.02 | 32.68 | 0.0274 | 2023-05-01 |
| Llama 2 70B (Together) | Meta / Together | 0.90 | 0.90 | 0.0014 | 2023-07-18 |
| GPT-4 Turbo | OpenAI | 10.00 | 30.00 | 0.0250 | 2023-11-06 |
| GPT-3.5 Turbo (0125) | OpenAI | 0.50 | 1.50 | 0.0013 | 2024-01-25 |
| Claude 3 Haiku | Anthropic | 0.25 | 1.25 | 0.0009 | 2024-03-13 |
| Claude 3 Opus | Anthropic | 15.00 | 75.00 | 0.0525 | 2024-03-04 |
| Gemini 1.5 Flash | 0.075 | 0.30 | 0.00023 | 2024-05-14 | |
| DeepSeek V2 | DeepSeek | 0.14 | 0.28 | 0.00028 | 2024-05-06 |
| GPT-4o mini | OpenAI | 0.15 | 0.60 | 0.00045 | 2024-07-18 |
| GPT-5 | OpenAI | 1.25 | 10.00 | 0.00625 | 2025-08-07 |
Related research
This study is part of Nesyona's AI tool economy research line. Companion studies:
- AI Tools Statistics 2026 covers consumer and prosumer AI tool pricing across the broader market and complements this API-tier focus.
- Forthcoming: AI Tool Pricing Reality 2026: 200 Tools, Real Cost-per-Outcome, and Why 60% of Free Tiers Are Loss Leaders (Q4 2026).
Embed this study
Press kit and downloads
Full press kit at /press/, machine-readable dataset at data.json. Media inquiries via the contact on the press page.