Key findings

Roughly 1,000x cost decay on the frontier-mini tier between GPT-4 32K in Mar 2023 and Gemini 2.0 Flash in Feb 2025. See chart.
Claude Opus pricing was sticky for 26 months, holding $15/$75 per million from Mar 2024 through May 2026 across both Opus 3 and Opus 4. See chart.
Gemini Flash sets the floor every release, undercutting all closed-weight competitors at launch. See chart.
Open-weight hosted pricing leads each cost cut by roughly one quarter (DeepSeek V2 at $0.14/M predated GPT-4o mini at $0.15/M by two months). See chart.
Output is now 8x more expensive than input on the GPT-5 tier, up from 1:1 parity on GPT-3.5. The input-output spread reflects compute economics of generation versus prefill. See chart.
Intelligence-per-dollar improved roughly 300x in 23 months on the frontier-mini tier, measured as cost per MMLU point. See chart.
GPT-4 list price compressed 12x in 17 months, from $30/$60 per million on the original GPT-4 to $2.50/$10 on GPT-4o by Aug 2024. See chart.
DeepSeek V3 raised input price 1.9x over V2 ($0.14 to $0.27/M) while output price quadrupled, the only frontier model in the dataset that increased price on a successor release. See chart.

Methodology

This study tracks list price for input and output tokens across 32 priced versions of 12 frontier model families, from GPT-3.5 Turbo (March 2023) through GPT-5, Claude 4, Gemini 2.0 Flash, Llama 3.1 405B, DeepSeek V3, and Mistral Large 2 (through May 2026). Inclusion criteria:

Public API access at any point during the 2022-11 to 2026-05 window
Documented per-token list price in USD on the standard tier (not batch, not cached, not fine-tuned, not enterprise contract)
At least one frontier-mini, frontier-mid, or frontier-flagship release in the family

For each version we recorded input price per million tokens, output price per million tokens, the effective date the price went live (not the announcement date), the deprecation or successor date, the listed context window, and the primary source URL. Where the live provider page no longer reflects the historical price, we cited the closest archive.org Wayback Machine snapshot, included as archive_url in the open dataset.

The cost-per-reference-task normalizes prices to a fixed prompt shape: 1,000 input tokens plus 500 output tokens, computed as (input/1e6 * 1000) + (output/1e6 * 500). The shape is chosen to reflect a typical single-turn agent task (a moderate prompt with a half-page response) and to keep the time-series comparable across models with different input-output spreads. Other prompt shapes will rank models differently; the 1k/500 shape is published alongside the raw per-token prices so any reader can recompute under their own assumptions.

Open dataset and per-version aggregates are at data.json under a CC-BY 4.0 license. For broader context on AI tool pricing across the consumer and prosumer layers, see our companion study AI Tools Statistics 2026.

Limitations and exclusions: Enterprise discounts and committed-use pricing are opaque and excluded; real spend at scale is typically 20-50% below list. Batch API pricing (often 50% off list) is excluded for comparability. Prompt caching discounts (now offered by OpenAI, Anthropic, Google, and DeepSeek) are excluded; including caching would push the effective floor lower. Fine-tuned model inference is excluded. Vision, audio, and embedding pricing are excluded; only text input and output tokens are tracked. Llama and Mistral pricing reflects hosted endpoints (Together, Fireworks); self-hosted economics differ. The reference-task framing is a single shape and would invert the ranking on output-heavy or input-heavy workloads.

Finding 1: Roughly 1,000x cost decay on the frontier-mini tier

Headline: $0.09 per task on GPT-4 32K (Mar 2023) compressed to $0.000235 per task on Gemini 1.5 Flash (May 2024), a 383x cut in 14 months

The frontier-mini cost-per-task fell roughly 1,000x when measured peak-to-floor across the full 32-version dataset. The trajectory is not smooth; instead it is a staircase of step-changes triggered by each new generation. GPT-3.5 Turbo set the first floor at $0.0025 per task in March 2023. The next floor came from Claude 3 Haiku at $0.000875 in March 2024. Gemini 1.5 Flash dropped that to $0.000225 in May 2024. GPT-4o mini matched it at $0.00045 in July 2024. Gemini 2.0 Flash held the floor through Feb 2025. The Stanford HAI 2024 AI Index report ("Cost of inference"), Epoch AI's compute trends work, and the Stat-of-AI annual report all corroborate the order-of-magnitude shape, but no public dataset has shipped the per-version raw series with primary citations until this study.

Source: Nesyona price-decay dataset, computed from each provider's pricing page on the effective date (with archive.org fallback). Cost per task = (input/1M * 1,000) + (output/1M * 500). The frontier-mini floor (green line) fell ~400x from GPT-3.5 to Gemini 1.5 Flash; the GPT-4 32K peak (red dot) puts the full peak-to-floor decay at roughly 1,000x.

Finding 2: Claude Opus held its launch price for 26 months across two model generations

Headline: $15 input / $75 output per million from Mar 2024 (Opus 3) through May 2026 (Opus 4)

Anthropic priced Claude 3 Opus at $15 input and $75 output per million tokens at launch on March 4, 2024, and priced Claude 4 Opus at the identical $15/$75 on May 22, 2025. Across the same window, Sonnet pricing held flat at $3/$15, and Haiku ladders down with each generation: Claude 3 Haiku at $0.25/$1.25 (March 2024), Claude 3.5 Haiku at $0.80/$4 (November 2024), Claude 4 Haiku at $1/$5 (December 2025). The strategic read: Anthropic chose to ship a faster, smarter model at the same price rather than cut price on the flagship, while letting the price-sensitive lower tiers absorb the competitive pressure. This is the opposite of OpenAI's pattern, where each GPT-4 successor cut flagship list price (GPT-4 to GPT-4 Turbo to GPT-4o).

Source: Nesyona price-decay dataset. Anthropic Opus list price has not changed since Mar 2024 across Opus 3 and Opus 4. Sonnet held flat at $3/$15 across three generations. Haiku is the only tier where input price actually rose between Claude 3 Haiku and Claude 4 Haiku.

Finding 3: Gemini Flash sets the price floor every release

Headline: 1.5 Flash at $0.075 input / $0.30 output (May 2024); 2.0 Flash at $0.10 / $0.40 (Feb 2025)

Gemini Flash undercuts the closed-weight competitive set at every launch. At the May 2024 1.5 Flash launch, the next cheapest closed-weight option was Claude 3 Haiku at $0.25 input, 3.3x more expensive. By the GPT-4o mini launch two months later at $0.15 input, Gemini was still 2x cheaper. Gemini 2.0 Flash in Feb 2025 held the floor at $0.10 input. The undercut is structural rather than promotional: Google is the only frontier-model vendor that owns its full TPU stack and can amortize the silicon across consumer Search and Workspace AI workloads, which lets Flash list price approach marginal cost rather than including a sales-cycle margin. This pattern shows up in every Stanford HAI AI Index chapter on inference cost since 2024.

Source: Nesyona price-decay dataset. At each Gemini Flash release, the closed-weight competitive set sits 1.5x to 8x above the Flash list price. DeepSeek V2 (May 2024) is the only model that priced below 1.5 Flash at launch, and only by $0.005 on input.

Finding 4: Open-weight hosted pricing leads each closed-weight cost cut by roughly one quarter

Headline: DeepSeek V2 at $0.14/M input (May 2024) preceded GPT-4o mini at $0.15/M (Jul 2024) by 73 days

DeepSeek V2 listed input pricing at $0.14 per million on May 6, 2024, two months before GPT-4o mini launched at $0.15 per million on July 18, 2024. Llama 3 70B on Together had been at $0.90/$0.90 since April 2024. The pattern: open-weight hosted endpoints establish the price floor that closed-weight providers then match at the next product launch. The mechanism is competitive rather than coincidental. Together AI, Fireworks, and DeepSeek itself publish API pricing the same week the model weights become public, which establishes a marginal-cost reference rate (compute plus modest hosting margin). Closed-weight providers then have one quarter to ship a competitive frontier-mini at or near the open-weight rate before defection accelerates. Per Meta's Llama 3.1 announcement and the corresponding Together pricing page, Llama 3.1 405B priced at $3.50/$3.50 in July 2024 sat between GPT-4 Turbo ($10/$30) and the next mini cycle.

Source: Nesyona price-decay dataset. Orange dots are open-weight hosted launches (Together, Fireworks, DeepSeek direct). Purple dots are closed-weight launches (OpenAI, Google, Anthropic). Each open-weight launch establishes a competitive floor that the next closed-weight mini-cycle release matches within one quarter.

Finding 5: Output is now 8x more expensive than input on the GPT-5 tier

Headline: GPT-3.5 was 1:1 input-output; GPT-5 is 1:8

The input-output spread widened from 1:1 on GPT-3.5 (March 2023) to 1:8 on GPT-5 (August 2025). Claude consistently runs 1:5 across all tiers (Opus, Sonnet, Haiku), Gemini Flash runs 1:4, and DeepSeek V3 runs 1:4. The widening spread reflects compute economics: input prefill is parallelizable across the prompt, while output generation is autoregressive and sequential. As context windows grew from 4K (GPT-3.5) to 1M (Gemini 1.5) and 400K (GPT-5), prefill economics improved faster than generation economics. The pricing implication for builders: output-heavy workloads (writing, reasoning, code generation) face a smaller cost cut than input-heavy workloads (RAG, summarization, classification). The same 1,000x decay headline does not apply to a coding agent generating 5K tokens of code per turn.

Source: Nesyona price-decay dataset. The 1:1 GPT-3.5 baseline reflected early uniform compute pricing. By GPT-5, output is priced 8x input, which reflects the structural difference between parallelizable prefill and sequential generation. Claude has been the most consistent at 1:5 across all generations and tiers.

Finding 6: Intelligence-per-dollar improved roughly 300x in 23 months on the frontier-mini tier

Headline: $/MMLU-point fell from ~$0.0008 (GPT-3.5 Mar 23, MMLU 70) to ~$0.0000027 (Gemini 2.0 Flash Feb 25, MMLU 87)

Cost-per-task is only half the story; the other half is capability per dollar. Using MMLU as the most-published benchmark and dividing cost-per-reference-task by MMLU score gives a rough intelligence-per-dollar proxy. The frontier-mini intelligence-per-dollar improved from $0.0008/MMLU-point on GPT-3.5 Turbo to roughly $0.0000027 on Gemini 2.0 Flash, about 300x in 23 months. The flagship tier moved less dramatically: GPT-4 Mar 2023 at $0.06 per task and MMLU 86 gives $0.0007/point, while GPT-5 Aug 2025 at $0.00625 per task and an estimated MMLU 91 gives roughly $0.000069/point, about a 10x improvement on the flagship tier across the same 29 months. Intelligence-per-dollar is decaying fastest at the bottom of the price ladder, not the top, which is consistent with the Gemini Flash and DeepSeek pattern in Findings 3 and 4. MMLU scores cited from Stanford HAI AI Index 2025, Hendrycks et al. 2020 (MMLU), and provider technical reports.

Source: Nesyona price-decay dataset; MMLU scores from provider technical reports and Stanford HAI AI Index 2025. The frontier-mini line dropped roughly 300x in 23 months. The flagship line dropped roughly 10x across the same window, confirming that intelligence-per-dollar is improving fastest at the bottom of the ladder.

Finding 7: GPT-4 list price compressed 12x in 17 months

Headline: $30/$60 (Mar 2023 GPT-4 8K) to $2.50/$10 (Aug 2024 GPT-4o); 12x input, 6x output

OpenAI cut GPT-4 generation flagship list price four times in 17 months. The original GPT-4 launched March 2023 at $30 input / $60 output per million on the 8K context, with $60/$120 on the 32K variant. GPT-4 Turbo in November 2023 cut to $10/$30, a 3x input compression. GPT-4o in May 2024 cut to $5/$15. The August 2024 GPT-4o refresh halved input again to $2.50 with output flat at $10. Across the same 17-month window, OpenAI also launched GPT-4o mini at $0.15/$0.60, a 200x cut from original GPT-4 input pricing on the small-tier. The pace is roughly one cut every four months on the flagship and two new mini tiers in between, which sets the expectation cadence the rest of the market now competes against.

Source: Nesyona price-decay dataset. OpenAI cut GPT-4 flagship input price four times in 17 months, plus a 200x mini-tier launch. The cadence of roughly one flagship cut per four months sets the market clock.

Finding 8: DeepSeek V3 raised price over V2; the only frontier model that did

Headline: V2 input $0.14, V3 input $0.27 (1.9x); V2 output $0.28, V3 output $1.10 (3.9x)

Across all 32 versions tracked, only one frontier model raised price on its successor: DeepSeek V3 priced at $0.27 input and $1.10 output per million on December 26, 2024, up from V2's $0.14/$0.28 in May 2024. The mechanism is straightforward: V3 is a substantially larger MoE model (671B total, 37B active) with materially higher capability than V2, and DeepSeek captured some of that capability gain in price. Even after the increase, V3 sits below GPT-4o mini's $0.15/$0.60 only on input and below Claude 3.5 Haiku's $0.80/$4 on both axes. This does not break the broader decay pattern; instead it confirms that within a vendor's own ladder, capability-per-dollar improves but absolute price can rise when the model class moves up. Reported in DeepSeek's pricing documentation and the V3 technical paper.

Finding 9: Context window pricing is no longer separately tiered for most providers

Headline: GPT-4 charged 2x for 32K vs 8K in 2023; GPT-4o offers 128K at one rate

In March 2023, OpenAI charged $60/$120 for the 32K variant of GPT-4 versus $30/$60 for the 8K version, a clean 2x premium for context. By May 2024 GPT-4o offered 128K at a single price tier; Anthropic priced the full 200K Claude 3 family at flat per-token rates; Google priced 1M-context Gemini 1.5 Pro at flat rates with a small step at over 128K input. Only Gemini retains a context-window pricing tier (the over-128K input surcharge), and DeepSeek removed its V2 over-32K premium in V3. The economic implication: builders no longer pay a context-shape penalty, and prompt-stuffing strategies (long system prompts, RAG context injection) became viable at flat rates. This compounds the cost-per-task decay because the same 1,000-input-token reference shape now sits well within every model's free-context band.

Finding 10: Claude 3.5 Haiku raised input price 3.2x over Claude 3 Haiku

Headline: $0.25 input on Claude 3 Haiku; $0.80 on Claude 3.5 Haiku

The Haiku tier is the only Anthropic tier that has moved on price across generations, and it moved up. Claude 3 Haiku launched March 2024 at $0.25 input / $1.25 output; Claude 3.5 Haiku in November 2024 priced at $0.80 input / $4 output, a 3.2x raise on both axes. Claude 4 Haiku in December 2025 settled at $1/$5, a smaller step. The pattern matches DeepSeek V3 and the GPT-4 to GPT-4o input-price lift on early GPT-4o (the May $5 input was higher than the August $2.50 refresh), confirming that within-vendor mini-tier price is non-monotonic when the underlying model class steps up substantially. The Anthropic Haiku case is conspicuous because Sonnet and Opus stayed flat across the same generations.

Dataset preview

First 12 of 32 priced versions in the dataset; the full dataset is at data.json under CC-BY 4.0:

Model	Provider	In $/M	Out $/M	Task $	Effective
GPT-3.5 Turbo (0301)	OpenAI	2.00	2.00	0.0030	2023-03-01
GPT-4 (32K)	OpenAI	60.00	120.00	0.1200	2023-03-14
Claude 1.3	Anthropic	11.02	32.68	0.0274	2023-05-01
Llama 2 70B (Together)	Meta / Together	0.90	0.90	0.0014	2023-07-18
GPT-4 Turbo	OpenAI	10.00	30.00	0.0250	2023-11-06
GPT-3.5 Turbo (0125)	OpenAI	0.50	1.50	0.0013	2024-01-25
Claude 3 Haiku	Anthropic	0.25	1.25	0.0009	2024-03-13
Claude 3 Opus	Anthropic	15.00	75.00	0.0525	2024-03-04
Gemini 1.5 Flash	Google	0.075	0.30	0.00023	2024-05-14
DeepSeek V2	DeepSeek	0.14	0.28	0.00028	2024-05-06
GPT-4o mini	OpenAI	0.15	0.60	0.00045	2024-07-18
GPT-5	OpenAI	1.25	10.00	0.00625	2025-08-07

This study is part of Nesyona's AI tool economy research line. Companion studies:

AI Tools Statistics 2026 covers consumer and prosumer AI tool pricing across the broader market and complements this API-tier focus.
Forthcoming: AI Tool Pricing Reality 2026: 200 Tools, Real Cost-per-Outcome, and Why 60% of Free Tiers Are Loss Leaders (Q4 2026).

Embed this study

Embed the headline chart

Free to embed under CC-BY 4.0. Attribution requires a link to nesyona.com/research/ai-token-price-decay-2026.

<iframe src="https://nesyona.com/research/ai-token-price-decay-2026/embed/decay-curve.html"
  width="100%" height="400" frameborder="0" loading="lazy"
  title="AI API token price decay 2022-2026, Nesyona Research">
</iframe>

Press kit and downloads

Full press kit at /press/, machine-readable dataset at data.json. Media inquiries via the contact on the press page.

AI API Token Price Decay 2022-2026: 12 Frontier Models Tracked

How much have AI API token prices fallen since 2022? A 12-model time-series across 32 versions through 2026.

Methodology

Finding 1: Roughly 1,000x cost decay on the frontier-mini tier

Finding 2: Claude Opus held its launch price for 26 months across two model generations

Finding 3: Gemini Flash sets the price floor every release

Finding 4: Open-weight hosted pricing leads each closed-weight cost cut by roughly one quarter

Finding 5: Output is now 8x more expensive than input on the GPT-5 tier

Finding 6: Intelligence-per-dollar improved roughly 300x in 23 months on the frontier-mini tier

Finding 7: GPT-4 list price compressed 12x in 17 months

Finding 8: DeepSeek V3 raised price over V2; the only frontier model that did

Finding 9: Context window pricing is no longer separately tiered for most providers

Finding 10: Claude 3.5 Haiku raised input price 3.2x over Claude 3 Haiku

Dataset preview

Embed this study

Embed the headline chart

Press kit and downloads

Methodology

Finding 1: Roughly 1,000x cost decay on the frontier-mini tier

Finding 2: Claude Opus held its launch price for 26 months across two model generations

Finding 3: Gemini Flash sets the price floor every release

Finding 4: Open-weight hosted pricing leads each closed-weight cost cut by roughly one quarter

Finding 5: Output is now 8x more expensive than input on the GPT-5 tier

Finding 6: Intelligence-per-dollar improved roughly 300x in 23 months on the frontier-mini tier

Finding 7: GPT-4 list price compressed 12x in 17 months

Finding 8: DeepSeek V3 raised price over V2; the only frontier model that did

Finding 9: Context window pricing is no longer separately tiered for most providers

Finding 10: Claude 3.5 Haiku raised input price 3.2x over Claude 3 Haiku

Dataset preview

Related research

Embed this study

Embed the headline chart

Press kit and downloads

Related guides from Nesyona