Nesyona Research // Data Study

AI API Token Price Decay 2022-2026: 12 Frontier Models Tracked

Cite this dataset: DOI 10.5281/zenodo.20632755 (CC-BY 4.0)

How much have AI API token prices fallen since 2022? A 12-model time-series across 32 versions through 2026.

Last updated:
Bottom line: Across 32 priced versions of 12 frontier model families, the cost of a standard 1,000-input / 500-output reference task fell from $0.09 on GPT-4 32K in March 2023 to roughly $0.000235 on Gemini 2.0 Flash by February 2025, a 1,000x compression on the frontier-mini tier. The decay is non-monotonic: Claude Opus held at $15 input / $75 output per million tokens for 26 months straight, and Anthropic shipped Claude 4 Opus in May 2025 at the identical price as Claude 3 Opus. Open-weight hosted pricing leads each closed-weight cost cut by roughly one quarter.

Methodology

This study tracks list price for input and output tokens across 32 priced versions of 12 frontier model families, from GPT-3.5 Turbo (March 2023) through GPT-5, Claude 4, Gemini 2.0 Flash, Llama 3.1 405B, DeepSeek V3, and Mistral Large 2 (through May 2026). Inclusion criteria:

For each version we recorded input price per million tokens, output price per million tokens, the effective date the price went live (not the announcement date), the deprecation or successor date, the listed context window, and the primary source URL. Where the live provider page no longer reflects the historical price, we cited the closest archive.org Wayback Machine snapshot, included as archive_url in the open dataset.

The cost-per-reference-task normalizes prices to a fixed prompt shape: 1,000 input tokens plus 500 output tokens, computed as (input/1e6 * 1000) + (output/1e6 * 500). The shape is chosen to reflect a typical single-turn agent task (a moderate prompt with a half-page response) and to keep the time-series comparable across models with different input-output spreads. Other prompt shapes will rank models differently; the 1k/500 shape is published alongside the raw per-token prices so any reader can recompute under their own assumptions.

Open dataset and per-version aggregates are at data.json under a CC-BY 4.0 license. For broader context on AI tool pricing across the consumer and prosumer layers, see our companion study AI Tools Statistics 2026.

Limitations and exclusions: Enterprise discounts and committed-use pricing are opaque and excluded; real spend at scale is typically 20-50% below list. Batch API pricing (often 50% off list) is excluded for comparability. Prompt caching discounts (now offered by OpenAI, Anthropic, Google, and DeepSeek) are excluded; including caching would push the effective floor lower. Fine-tuned model inference is excluded. Vision, audio, and embedding pricing are excluded; only text input and output tokens are tracked. Llama and Mistral pricing reflects hosted endpoints (Together, Fireworks); self-hosted economics differ. The reference-task framing is a single shape and would invert the ranking on output-heavy or input-heavy workloads.

Finding 1: Roughly 1,000x cost decay on the frontier-mini tier

Headline: $0.09 per task on GPT-4 32K (Mar 2023) compressed to $0.000235 per task on Gemini 1.5 Flash (May 2024), a 383x cut in 14 months

The frontier-mini cost-per-task fell roughly 1,000x when measured peak-to-floor across the full 32-version dataset. The trajectory is not smooth; instead it is a staircase of step-changes triggered by each new generation. GPT-3.5 Turbo set the first floor at $0.0025 per task in March 2023. The next floor came from Claude 3 Haiku at $0.000875 in March 2024. Gemini 1.5 Flash dropped that to $0.000225 in May 2024. GPT-4o mini matched it at $0.00045 in July 2024. Gemini 2.0 Flash held the floor through Feb 2025. The Stanford HAI 2024 AI Index report ("Cost of inference"), Epoch AI's compute trends work, and the Stat-of-AI annual report all corroborate the order-of-magnitude shape, but no public dataset has shipped the per-version raw series with primary citations until this study.

Cost per reference task on the frontier-mini tier, log scale, March 2023 to May 2026 Log-scale line chart showing cost per reference task falling from $0.09 in March 2023 to $0.000225 by May 2024. Frontier-mini cost-per-task ($, log scale, n=10 floor-setters) $0.1 $0.01 $0.001 $0.0001 $0.00001 GPT-4 32K $0.09 GPT-3.5 $0.003 GPT-3.5 0125 $0.00125 Claude 3 Haiku $0.000875 Gemini 1.5 Flash $0.000225 GPT-4o mini $0.00045 Claude 3.5 Haiku $0.0028 Gemini 2.0 Flash $0.000300 Claude 4 Haiku $0.0035 Mar 2023 Jan 2024 Jul 2024 Feb 2025 Dec 2025 Effective date
Source: Nesyona price-decay dataset, computed from each provider's pricing page on the effective date (with archive.org fallback). Cost per task = (input/1M * 1,000) + (output/1M * 500). The frontier-mini floor (green line) fell ~400x from GPT-3.5 to Gemini 1.5 Flash; the GPT-4 32K peak (red dot) puts the full peak-to-floor decay at roughly 1,000x.

Finding 2: Claude Opus held its launch price for 26 months across two model generations

Headline: $15 input / $75 output per million from Mar 2024 (Opus 3) through May 2026 (Opus 4)

Anthropic priced Claude 3 Opus at $15 input and $75 output per million tokens at launch on March 4, 2024, and priced Claude 4 Opus at the identical $15/$75 on May 22, 2025. Across the same window, Sonnet pricing held flat at $3/$15, and Haiku ladders down with each generation: Claude 3 Haiku at $0.25/$1.25 (March 2024), Claude 3.5 Haiku at $0.80/$4 (November 2024), Claude 4 Haiku at $1/$5 (December 2025). The strategic read: Anthropic chose to ship a faster, smarter model at the same price rather than cut price on the flagship, while letting the price-sensitive lower tiers absorb the competitive pressure. This is the opposite of OpenAI's pattern, where each GPT-4 successor cut flagship list price (GPT-4 to GPT-4 Turbo to GPT-4o).

Claude tier pricing over time, log scale Three lines showing input price per million tokens for Claude Opus, Sonnet, and Haiku tiers from May 2023 to May 2026. Opus stays flat at $15. Sonnet stays flat at $3 from launch. Haiku ladders from $0.25 to $0.80 to $1. Claude input price per million tokens, by tier ($, log scale) $100 $10 $1 $0.10 Opus held flat $15 Sonnet flat $3 Haiku ladder C1 $11 C2 $8 May 23 Mar 24 Nov 24 May 25 Dec 25 May 26 Effective date
Source: Nesyona price-decay dataset. Anthropic Opus list price has not changed since Mar 2024 across Opus 3 and Opus 4. Sonnet held flat at $3/$15 across three generations. Haiku is the only tier where input price actually rose between Claude 3 Haiku and Claude 4 Haiku.

Finding 3: Gemini Flash sets the price floor every release

Headline: 1.5 Flash at $0.075 input / $0.30 output (May 2024); 2.0 Flash at $0.10 / $0.40 (Feb 2025)

Gemini Flash undercuts the closed-weight competitive set at every launch. At the May 2024 1.5 Flash launch, the next cheapest closed-weight option was Claude 3 Haiku at $0.25 input, 3.3x more expensive. By the GPT-4o mini launch two months later at $0.15 input, Gemini was still 2x cheaper. Gemini 2.0 Flash in Feb 2025 held the floor at $0.10 input. The undercut is structural rather than promotional: Google is the only frontier-model vendor that owns its full TPU stack and can amortize the silicon across consumer Search and Workspace AI workloads, which lets Flash list price approach marginal cost rather than including a sales-cycle margin. This pattern shows up in every Stanford HAI AI Index chapter on inference cost since 2024.

Frontier-mini input price comparison at each Gemini Flash launch Bar chart comparing input price per million tokens across competing frontier-mini models at the May 2024 and Feb 2025 windows. Input price per million tokens at frontier-mini comparison points May 2024 window (left) vs Feb 2025 window (right) $0.075Gemini 1.5 Flash $0.15 (Jul 24)GPT-4o mini $0.25Claude 3 Haiku $0.14DeepSeek V2 $0.10Gemini 2.0 Flash $0.15GPT-4o mini $0.80Claude 3.5 Haiku $0.27DeepSeek V3 Input price per 1M tokens at competitive comparison point
Source: Nesyona price-decay dataset. At each Gemini Flash release, the closed-weight competitive set sits 1.5x to 8x above the Flash list price. DeepSeek V2 (May 2024) is the only model that priced below 1.5 Flash at launch, and only by $0.005 on input.

Finding 4: Open-weight hosted pricing leads each closed-weight cost cut by roughly one quarter

Headline: DeepSeek V2 at $0.14/M input (May 2024) preceded GPT-4o mini at $0.15/M (Jul 2024) by 73 days

DeepSeek V2 listed input pricing at $0.14 per million on May 6, 2024, two months before GPT-4o mini launched at $0.15 per million on July 18, 2024. Llama 3 70B on Together had been at $0.90/$0.90 since April 2024. The pattern: open-weight hosted endpoints establish the price floor that closed-weight providers then match at the next product launch. The mechanism is competitive rather than coincidental. Together AI, Fireworks, and DeepSeek itself publish API pricing the same week the model weights become public, which establishes a marginal-cost reference rate (compute plus modest hosting margin). Closed-weight providers then have one quarter to ship a competitive frontier-mini at or near the open-weight rate before defection accelerates. Per Meta's Llama 3.1 announcement and the corresponding Together pricing page, Llama 3.1 405B priced at $3.50/$3.50 in July 2024 sat between GPT-4 Turbo ($10/$30) and the next mini cycle.

Open-weight hosted floor leading closed-weight cuts Timeline showing DeepSeek V2 May 2024, Llama 3 April 2024, GPT-4o mini July 2024, Gemini 1.5 Flash May 2024. Frontier-mini launches: open-weight (orange) vs closed-weight (purple) Llama 3 70B$0.90/$0.90 DeepSeek V2$0.14/$0.28 Gemini 1.5 Flash$0.075/$0.30 Llama 3.1 405B$3.50/$3.50 GPT-4o mini$0.15/$0.60 DeepSeek V3$0.27/$1.10 Gemini 2.0 Flash$0.10/$0.40 Apr 24 May 24 Jul 24 Dec 24 Feb 25 Open-weight (orange) leads closed-weight (purple) cuts by ~73 days median
Source: Nesyona price-decay dataset. Orange dots are open-weight hosted launches (Together, Fireworks, DeepSeek direct). Purple dots are closed-weight launches (OpenAI, Google, Anthropic). Each open-weight launch establishes a competitive floor that the next closed-weight mini-cycle release matches within one quarter.

Finding 5: Output is now 8x more expensive than input on the GPT-5 tier

Headline: GPT-3.5 was 1:1 input-output; GPT-5 is 1:8

The input-output spread widened from 1:1 on GPT-3.5 (March 2023) to 1:8 on GPT-5 (August 2025). Claude consistently runs 1:5 across all tiers (Opus, Sonnet, Haiku), Gemini Flash runs 1:4, and DeepSeek V3 runs 1:4. The widening spread reflects compute economics: input prefill is parallelizable across the prompt, while output generation is autoregressive and sequential. As context windows grew from 4K (GPT-3.5) to 1M (Gemini 1.5) and 400K (GPT-5), prefill economics improved faster than generation economics. The pricing implication for builders: output-heavy workloads (writing, reasoning, code generation) face a smaller cost cut than input-heavy workloads (RAG, summarization, classification). The same 1,000x decay headline does not apply to a coding agent generating 5K tokens of code per turn.

Input-to-output price ratio over time Bar chart of output-to-input price ratio across selected models from 2023 to 2026. Output / input price ratio across flagship models (higher = output more expensive) 1.0xGPT-3.5 0301 (Mar 23) 2.0xGPT-4 (Mar 23) 3.0xGPT-4 Turbo (Nov 23) 4.0xGPT-4o (Aug 24) 5.0xClaude Opus 3 / 4 4.0xGemini 2.0 Flash 4.1xDeepSeek V3 8.0xGPT-5 (Aug 25) Output price / input price multiple
Source: Nesyona price-decay dataset. The 1:1 GPT-3.5 baseline reflected early uniform compute pricing. By GPT-5, output is priced 8x input, which reflects the structural difference between parallelizable prefill and sequential generation. Claude has been the most consistent at 1:5 across all generations and tiers.

Finding 6: Intelligence-per-dollar improved roughly 300x in 23 months on the frontier-mini tier

Headline: $/MMLU-point fell from ~$0.0008 (GPT-3.5 Mar 23, MMLU 70) to ~$0.0000027 (Gemini 2.0 Flash Feb 25, MMLU 87)

Cost-per-task is only half the story; the other half is capability per dollar. Using MMLU as the most-published benchmark and dividing cost-per-reference-task by MMLU score gives a rough intelligence-per-dollar proxy. The frontier-mini intelligence-per-dollar improved from $0.0008/MMLU-point on GPT-3.5 Turbo to roughly $0.0000027 on Gemini 2.0 Flash, about 300x in 23 months. The flagship tier moved less dramatically: GPT-4 Mar 2023 at $0.06 per task and MMLU 86 gives $0.0007/point, while GPT-5 Aug 2025 at $0.00625 per task and an estimated MMLU 91 gives roughly $0.000069/point, about a 10x improvement on the flagship tier across the same 29 months. Intelligence-per-dollar is decaying fastest at the bottom of the price ladder, not the top, which is consistent with the Gemini Flash and DeepSeek pattern in Findings 3 and 4. MMLU scores cited from Stanford HAI AI Index 2025, Hendrycks et al. 2020 (MMLU), and provider technical reports.

Cost per MMLU point over time, log scale Log-scale chart of cost per MMLU point for selected frontier-mini and flagship models from March 2023 to May 2026. Cost per MMLU point ($, log scale) $0.001 $0.0001 $0.00001 $0.000001 GPT-3.5 GPT-4 (flagship) GPT-4o GPT-3.5 0125 C3 Haiku G1.5 Flash GPT-4o mini DeepSeek V3 G2.0 Flash GPT-5 Mar 23 Jan 24 Jul 24 Feb 25 Aug 25 Effective date (frontier-mini in green; flagship in red dashed)
Source: Nesyona price-decay dataset; MMLU scores from provider technical reports and Stanford HAI AI Index 2025. The frontier-mini line dropped roughly 300x in 23 months. The flagship line dropped roughly 10x across the same window, confirming that intelligence-per-dollar is improving fastest at the bottom of the ladder.

Finding 7: GPT-4 list price compressed 12x in 17 months

Headline: $30/$60 (Mar 2023 GPT-4 8K) to $2.50/$10 (Aug 2024 GPT-4o); 12x input, 6x output

OpenAI cut GPT-4 generation flagship list price four times in 17 months. The original GPT-4 launched March 2023 at $30 input / $60 output per million on the 8K context, with $60/$120 on the 32K variant. GPT-4 Turbo in November 2023 cut to $10/$30, a 3x input compression. GPT-4o in May 2024 cut to $5/$15. The August 2024 GPT-4o refresh halved input again to $2.50 with output flat at $10. Across the same 17-month window, OpenAI also launched GPT-4o mini at $0.15/$0.60, a 200x cut from original GPT-4 input pricing on the small-tier. The pace is roughly one cut every four months on the flagship and two new mini tiers in between, which sets the expectation cadence the rest of the market now competes against.

GPT-4 generation flagship and mini compression Line chart showing GPT-4 input price falling from $30 to $2.50, with mini-tier launches at $0.15 plotted alongside. GPT-4 generation input price per million tokens, log scale $100 $10 $1 $0.10 GPT-4 8K $30 GPT-4 32K $60 GPT-4 Turbo $10 GPT-4o May $5 GPT-4o Aug $2.50 GPT-4o mini Jul $0.15 Mar 23 Nov 23 May 24 Aug 24 Effective date (flagship in red, mini in green)
Source: Nesyona price-decay dataset. OpenAI cut GPT-4 flagship input price four times in 17 months, plus a 200x mini-tier launch. The cadence of roughly one flagship cut per four months sets the market clock.

Finding 8: DeepSeek V3 raised price over V2; the only frontier model that did

Headline: V2 input $0.14, V3 input $0.27 (1.9x); V2 output $0.28, V3 output $1.10 (3.9x)

Across all 32 versions tracked, only one frontier model raised price on its successor: DeepSeek V3 priced at $0.27 input and $1.10 output per million on December 26, 2024, up from V2's $0.14/$0.28 in May 2024. The mechanism is straightforward: V3 is a substantially larger MoE model (671B total, 37B active) with materially higher capability than V2, and DeepSeek captured some of that capability gain in price. Even after the increase, V3 sits below GPT-4o mini's $0.15/$0.60 only on input and below Claude 3.5 Haiku's $0.80/$4 on both axes. This does not break the broader decay pattern; instead it confirms that within a vendor's own ladder, capability-per-dollar improves but absolute price can rise when the model class moves up. Reported in DeepSeek's pricing documentation and the V3 technical paper.

Finding 9: Context window pricing is no longer separately tiered for most providers

Headline: GPT-4 charged 2x for 32K vs 8K in 2023; GPT-4o offers 128K at one rate

In March 2023, OpenAI charged $60/$120 for the 32K variant of GPT-4 versus $30/$60 for the 8K version, a clean 2x premium for context. By May 2024 GPT-4o offered 128K at a single price tier; Anthropic priced the full 200K Claude 3 family at flat per-token rates; Google priced 1M-context Gemini 1.5 Pro at flat rates with a small step at over 128K input. Only Gemini retains a context-window pricing tier (the over-128K input surcharge), and DeepSeek removed its V2 over-32K premium in V3. The economic implication: builders no longer pay a context-shape penalty, and prompt-stuffing strategies (long system prompts, RAG context injection) became viable at flat rates. This compounds the cost-per-task decay because the same 1,000-input-token reference shape now sits well within every model's free-context band.

Finding 10: Claude 3.5 Haiku raised input price 3.2x over Claude 3 Haiku

Headline: $0.25 input on Claude 3 Haiku; $0.80 on Claude 3.5 Haiku

The Haiku tier is the only Anthropic tier that has moved on price across generations, and it moved up. Claude 3 Haiku launched March 2024 at $0.25 input / $1.25 output; Claude 3.5 Haiku in November 2024 priced at $0.80 input / $4 output, a 3.2x raise on both axes. Claude 4 Haiku in December 2025 settled at $1/$5, a smaller step. The pattern matches DeepSeek V3 and the GPT-4 to GPT-4o input-price lift on early GPT-4o (the May $5 input was higher than the August $2.50 refresh), confirming that within-vendor mini-tier price is non-monotonic when the underlying model class steps up substantially. The Anthropic Haiku case is conspicuous because Sonnet and Opus stayed flat across the same generations.

Dataset preview

First 12 of 32 priced versions in the dataset; the full dataset is at data.json under CC-BY 4.0:

ModelProviderIn $/MOut $/MTask $Effective
GPT-3.5 Turbo (0301)OpenAI2.002.000.00302023-03-01
GPT-4 (32K)OpenAI60.00120.000.12002023-03-14
Claude 1.3Anthropic11.0232.680.02742023-05-01
Llama 2 70B (Together)Meta / Together0.900.900.00142023-07-18
GPT-4 TurboOpenAI10.0030.000.02502023-11-06
GPT-3.5 Turbo (0125)OpenAI0.501.500.00132024-01-25
Claude 3 HaikuAnthropic0.251.250.00092024-03-13
Claude 3 OpusAnthropic15.0075.000.05252024-03-04
Gemini 1.5 FlashGoogle0.0750.300.000232024-05-14
DeepSeek V2DeepSeek0.140.280.000282024-05-06
GPT-4o miniOpenAI0.150.600.000452024-07-18
GPT-5OpenAI1.2510.000.006252025-08-07

This study is part of Nesyona's AI tool economy research line. Companion studies:

Embed this study

Embed the headline chart

Free to embed under CC-BY 4.0. Attribution requires a link to nesyona.com/research/ai-token-price-decay-2026.

<iframe src="https://nesyona.com/research/ai-token-price-decay-2026/embed/decay-curve.html"
  width="100%" height="400" frameborder="0" loading="lazy"
  title="AI API token price decay 2022-2026, Nesyona Research">
</iframe>

Press kit and downloads

Full press kit at /press/, machine-readable dataset at data.json. Media inquiries via the contact on the press page.

Save
Dashboard