Sources

Sources & references

Every figure on this site is traceable. Below: the independent estimates we compare against, the primary company disclosures behind the reported floor (each with an archived snapshot), and the research, analyst and hardware literature we draw on.

Independent estimates of the global total, and how each was derived

Third-party estimates, shown for comparison, not summed into our figures. Only the current global-scope estimates feed our headline band; forecasts and narrower or older figures are shown for context.

SourceScopeT/dayAs ofMethodology
Epoch AI / Exponential Viewin band All providers (global) ~432 mid-2026 Demand estimate relayed by Epoch from Exponential View (~5 billion tokens/sec); the underlying derivation is not published.
OpenRouter (1% extrapolation)in band Global inference ~400 late-2025 Extrapolation: OpenRouter's measured ~1T/day framed by its team as roughly 1% of global inference, implying ~400T/day.
Tomasz Tunguz (The Token Race) Global (all providers) ~88 Sep-2025 Top-down sum of provider disclosures (Google + Azure + neoclouds), plus an estimated OpenAI/Anthropic figure the author openly labels 'from thin air'.
a16z / OpenRouter LLM API market ~50 late-2025 First-party telemetry of routed tokens (measured, request-level). Authors explicitly decline to extrapolate to a global total.
Epoch AI (compute-derived) Frontier lab (OpenAI) 10–100 late-2025 Compute budget: installed H100-equivalents x FLOP/day / FLOP-per-token (2 x active params), at 5-30% inference utilization. Cross-checked against ~4B messages/day x ~4k tokens.
Goldman Sachs (Agentic Economy) Global forecast 2030 ~4,000 forecast-2030 Bottom-up agent-task simulation: tokens per task step x task frequency x knowledge-worker adoption (~12% by 2030); 120 quadrillion tokens/month.

A normalization caveat. "Tokens processed" (Google's metric: input + reasoning + free and internal usage) and "tokens generated" (output only) differ by a large factor, so estimates built on different definitions are not directly comparable. We flag each estimate's scope rather than forcing one definition.

Primary company disclosures

The sources behind the reported floor and observations, with archived snapshots.

SourceOriginalArchive
Google I/O 2025 (restated in Alphabet Q2 2025 CEO remarks) link snapshot
Sundar Pichai at Google I/O 2026 link snapshot
Microsoft FY2025 Q3 earnings (50T in March) link pending
Microsoft FY2025 Q4 earnings call link snapshot
Sam Altman via Epoch AI usage dataset link pending
OpenRouter/a16z State of AI 2025 link snapshot
OpenAI statistics (15B tokens/min) link snapshot
Fireworks AI ($315M ARR; 10T/day) via press link pending
Fast Company (Lin Qiao interview) link snapshot
Fireworks AI homepage link pending
Tomasz Tunguz (The Token Race) link pending
National Data Bureau (Liu Liehong) via China News Service link snapshot
Volcano Engine via Robonomics Token Tracker link pending
Volcano Engine via China Daily link snapshot
Volcano Engine (Tan Dai) via KuCoin link pending
Business Insider (CEO Weinberg) via AOL link snapshot
Alphabet Q3 2025 CEO remarks link pending

Estimate sources

The revenue/usage anchors behind the implied estimates.

EntityBasisSource
Anthropic ~$47B annualized run-rate (May 2026) / $3-8 per Mtok blended link
OpenAI 2.5B messages/day x 1,000-4,000 tokens/message (Epoch's full-context anchor); consumer only, API counted in the floor link · archive
xAI ~7-10M DAU x 5-20 msgs x 1.5-3k tokens; revenue ~$0.5B mostly free so revenue-implied fails link · archive
Meta 1B MAU (May 2025) x 8-13% DAU/MAU (~80-130M DAU) x 3-4 msgs x 1-2k tokens/message; Meta discloses users, never tokens link

Research & academic

SourceYearWhat it contributes
Epoch AI: AI Companies token-usage dataset (open CSV) 2026 Structured open dataset of per-company daily token figures with sources & confidence; corroborates our Google and OpenAI numbers
Epoch AI: Is a compute crunch coming? 2026 Estimates global throughput at ~5 billion tokens/sec (~432T/day) across all providers
Epoch AI: How many digital workers could OpenAI deploy? 2025 Compute-derived estimate of frontier token output (10–100T/day; GPT-5 ~19T/day)
Epoch AI: Frontier labs don't use most AI compute 2026 Global installed base ~15–16M H100-equivalents operational (end-2025)
Epoch AI: AI chip production / installed base 2026 Accelerator stock growing ~3.3x/year; NVIDIA >60% of total compute
NBER WP 34255: How People Use ChatGPT 2025 OpenAI-coauthored study: 700M weekly users sending ~2.5B messages/day
a16z + OpenRouter: State of AI: 100 Trillion Token Study 2025 Empirical study of 100T+ tokens of real LLM usage; LLM API market ~50T/day
State of AI empirical study (arXiv:2601.10088) 2026 Peer-archived version of the OpenRouter/a16z 100-trillion-token study
Stanford HAI: AI Index Report 2025 2025 Inference price fell ~280x (Nov 2022–Oct 2024); context for volume growth
Photons = Tokens (arXiv:2603.06630): a global token balance sheet 2026 Energy divided by Wh-per-token supply estimate (~6.5e17 tokens/yr capacity by 2028) with a 2024 demand anchor of ~1e12 to 1e13 tokens/day
Erdil / Epoch: Inference economics of language models (arXiv:2506.04645) 2025 Roofline per-GPU throughput building blocks (e.g. H100 memory-bound floor ~42 tokens/sec/GPU single-stream)
Epoch AI: How much energy does ChatGPT use? 2025 Tokens-per-message anchors: ~500 output tokens/query (measured avg ~269), ~4,000 full-context tokens/message; 0.3 Wh/query
Meta: Llama usage doubled May through July 2024 2024 Meta's only direct token-volume disclosure: Llama tokens served via CSP partners grew 10x Jan-Jul 2024 (growth multiples only, no absolute number)
NBER w34608: The Emerging Market for Intelligence 2026 API usage econometrics (OpenRouter + Azure) on token demand and price elasticity
Google: Measuring the Environmental Impact of AI at Google Scale (arXiv:2508.15734) 2025 0.24 Wh per median Gemini prompt; energy divided by prompt count, not tokens (no per-token figure)
Epoch AI: Computing capacity (installed FLOP) 2025 ~20M H100-equivalents global compute stock, from revenue divided by chip price

Analyst & industry

SourceYearWhat it contributes
Bond Capital: Trends in Artificial Intelligence (Mary Meeker) 2025 Inference cost down ~99.7% over two years; demand flywheel
Goldman Sachs: AI Agents Forecast to Boost Tech Cash Flow 2025 Forecasts ~24x token-demand growth by 2030 as agentic usage dominates
Tomasz Tunguz: The Token Race 2025 Cross-provider token tallies (Google
Tomasz Tunguz: Is Token Consumption Slowing Down? 2025 Google's additive monthly growth roughly halved mid-2025 (rate
Azeem Azhar: Exponential View (Magnitudes of intelligence) 2026 Source of the ~5 billion tokens/sec (~432T/day) global estimate; per-user token growth
OpenRouter / a16z: token usage by billing geography 2025 US ~47% / Europe ~18% / Asia ~13% of token spend; basis for the consumption-geography split in the country view
AI 2027: Compute Forecast 2025 Install-base-derived OpenAI token estimate: ~2T/day in 2024 scaling to ~80T/day by 2027
YipitData: cloud and LLM pricing trends 2025 Alt-data market-wide estimate of ~150T tokens/month (~2 quadrillion annualized)
Menlo Ventures: State of Generative AI in the Enterprise 2025 Enterprise survey (~495 US decision-makers); $37B enterprise AI spend; token usage up ~320x
OpenRouter: Series B announcement 2026 Platform processing 25T tokens/week (May 2026), up from 5T in six months
Morgan Stanley: AI market trends 2026 Uses token throughput (6.4T to 22.7T tokens/week) as a demand proxy; forecasts hardware in dollars, not token volume
Anthropic via Sentisight: GenAI usage by hour and day 2026 Weekday peak 8am-2pm ET (12-18 UTC) per Anthropic Claude usage; weekend visits down ~23%
Cloudflare Radar: AI Insights (time-of-day) 2026 Infrastructure-level GenAI traffic by time of day; weekdays consistently outpace weekends

Compute & hardware

SourceYearWhat it contributes
NVIDIA: Blackwell leads on SemiAnalysis InferenceMAX 2025 B200 ~10
Artificial Analysis: Hardware benchmarks 2026 Per-GPU production throughput across models and accelerators
FlexPipe (arXiv:2510.11938): serving-pipeline efficiency 2025 GPU reservation falls ~75% to ~30% with better pipelining (a serving/allocation paper
Meta: The Llama 3 Herd of Models (arXiv:2407.21783) 2024 Best-case training MFU ~38–43% even on optimized 16K-H100 runs
Epoch AI via The Decoder: global AI compute ~15M H100e 2026 Global operational AI compute >15M H100-equivalents; >10 GW power
TokenPowerBench (arXiv:2512.03024): energy per token 2025 Whole-system ~40-60 J/token at high load on a 32xH100 cluster; GPU is >60% of energy
LMSYS: Large-scale expert parallelism (DeepSeek R1) 2025 ~2,788 tokens/sec/GPU decode for DeepSeek-R1 671B MoE on H100
SemiAnalysis: InferenceMAX open-source inference benchmark 2025 Per-GPU throughput primitives (B200 ~60,000 tokens/sec/GPU); aggregate Tokenomics model is subscriber-gated

A note on comparability. These sources do not all measure the same thing: input vs output tokens, one provider vs all surfaces, marketplace samples vs global totals, multimodal vs text. We preserve what each source reported and flag the differences rather than forcing them into one number. See the methodology for how we normalize and what we exclude.