Nesyona Research // Press Kit

Press Kit: AI API Token Price Decay 2022-2026

Published 2026-05-09 · Embargo: none, available for immediate publication · Read the full study

Dataset summary

Five quotable findings

"The cost of a standard 1,000-input, 500-output AI task fell roughly one thousand times on the frontier-mini tier between March 2023 and February 2025. GPT-4 32K cost nine cents per task; Gemini 2.0 Flash now costs about two-hundredths of a cent. The decay rate is fastest at the bottom of the price ladder, not the top."

Attribute to: Vincent, Nesyona Research, in Nesyona's 2026 AI API Token Price Decay study

"Claude Opus has not changed price in 26 months. Anthropic priced Opus 3 at fifteen dollars input and seventy-five dollars output per million tokens in March 2024 and shipped Claude 4 Opus at the identical numbers in May 2025. Across the same window, every other frontier flagship has cut list price at least once. Anthropic chose to ship a faster model at the same price rather than cut price on the flagship."

Attribute to: Vincent, Nesyona Research

"Gemini Flash sets the price floor every release. At the May 2024 launch, the next cheapest closed-weight option was three times more expensive. Google is the only frontier-model vendor that owns its full TPU stack and can amortize silicon across consumer Search workloads, which lets Flash list price approach marginal compute cost rather than carrying a sales-cycle margin."

Attribute to: Vincent, Nesyona Research

"Open-weight hosted pricing leads each closed-weight cost cut by roughly one quarter. DeepSeek V2 priced at fourteen cents per million input tokens in May 2024. Two months later GPT-4o mini matched at fifteen cents. Together AI and Fireworks publish open-weight pricing the week the weights become public, which establishes a marginal-cost reference rate that closed-weight providers then chase at the next product launch."

Attribute to: Vincent, Nesyona Research

"Output is now eight times more expensive than input on the GPT-5 tier. GPT-3.5 was one-to-one. The widening reflects compute economics: prefill is parallelizable across the prompt while generation is sequential. Builders running output-heavy workloads (writing, reasoning, code generation) face a meaningfully smaller cost cut than the headline thousand-x decay implies."

Attribute to: Vincent, Nesyona Research

Suggested coverage angles

This study lands at the intersection of AI infrastructure economics, vendor strategy, and developer tooling. Suggested angles for journalists:

Suggested press targets

Downloads

Embed the headline chart

Free to embed under CC-BY 4.0 with link attribution to the study page:

<iframe src="https://nesyona.com/research/ai-token-price-decay-2026/embed/decay-curve.html"
  width="100%" height="400" frameborder="0" loading="lazy"
  title="AI API token price decay 2022 to 2026, Nesyona Research">
</iframe>

Media contact

Vincent, Nesyona Research

Email: [email protected] [PRESS-EMAIL-PENDING]

Available for: written quotes (24 hour turnaround), data drill-down requests, podcast or video interview, on-record commentary on AI API pricing trends, vendor strategy reads, and developer-cost implications.

Response time: typically same-day during ET business hours.

About Nesyona

Nesyona is an independent AI tool economy research site at nesyona.com. We track real pricing, free-tier longevity, capability boundaries, and cost-per-outcome across the AI product stack. Nesyona is part of the DeepSynthesis Lattice, a constellation of niche research sites covering ecommerce, finance, education, health, and AI verticals.

Save
Dashboard