Sources

Sources & references

Every figure on this site is traceable. Below: the independent estimates we compare against, the primary company disclosures behind the reported floor (each with an archived snapshot), and the research, analyst and hardware literature we draw on.

Independent estimates of the global total, and how each was derived

Third-party estimates, shown for comparison, not summed into our figures. Only the current global-scope estimates feed our headline band; forecasts and narrower or older figures are shown for context.

Source	Scope	T/day	As of	Methodology
Epoch AI / Exponential Viewin band	All providers (global)	~432	mid-2026	Demand estimate relayed by Epoch from Exponential View (~5 billion tokens/sec); the underlying derivation is not published.
OpenRouter (1% extrapolation)in band	Global inference	~400	late-2025	Extrapolation: OpenRouter's measured ~1T/day framed by its team as roughly 1% of global inference, implying ~400T/day.
Tomasz Tunguz (The Token Race)	Global (all providers)	~88	Sep-2025	Top-down sum of provider disclosures (Google + Azure + neoclouds), plus an estimated OpenAI/Anthropic figure the author openly labels 'from thin air'.
a16z / OpenRouter	LLM API market	~50	late-2025	First-party telemetry of routed tokens (measured, request-level). Authors explicitly decline to extrapolate to a global total.
Epoch AI (compute-derived)	Frontier lab (OpenAI)	10–100	late-2025	Compute budget: installed H100-equivalents x FLOP/day / FLOP-per-token (2 x active params), at 5-30% inference utilization. Cross-checked against ~4B messages/day x ~4k tokens.
Goldman Sachs (Agentic Economy)	Global forecast 2030	~4,000	forecast-2030	Bottom-up agent-task simulation: tokens per task step x task frequency x knowledge-worker adoption (~12% by 2030); 120 quadrillion tokens/month.

A normalization caveat. "Tokens processed" (Google's metric: input + reasoning + free and internal usage) and "tokens generated" (output only) differ by a large factor, so estimates built on different definitions are not directly comparable. We flag each estimate's scope rather than forcing one definition.

Primary company disclosures

The sources behind the reported floor and observations, with archived snapshots.

Source	Original	Archive
Google I/O 2025 (restated in Alphabet Q2 2025 CEO remarks)	link	snapshot
Sundar Pichai at Google I/O 2026	link	snapshot
Microsoft FY2025 Q3 earnings (50T in March)	link	pending
Microsoft FY2025 Q4 earnings call	link	snapshot
Sam Altman via Epoch AI usage dataset	link	pending
OpenRouter/a16z State of AI 2025	link	snapshot
OpenAI statistics (15B tokens/min)	link	snapshot
Fireworks AI ($315M ARR; 10T/day) via press	link	pending
Fast Company (Lin Qiao interview)	link	snapshot
Fireworks AI homepage	link	pending
Tomasz Tunguz (The Token Race)	link	pending
National Data Bureau (Liu Liehong) via China News Service	link	snapshot
Volcano Engine via Robonomics Token Tracker	link	pending
Volcano Engine via China Daily	link	snapshot
Volcano Engine (Tan Dai) via KuCoin	link	pending
Business Insider (CEO Weinberg) via AOL	link	snapshot
Alphabet Q3 2025 CEO remarks	link	pending

Estimate sources

The revenue/usage anchors behind the implied estimates.

Entity	Basis	Source
Anthropic	~$47B annualized run-rate (May 2026) / $3-8 per Mtok blended	link
OpenAI	2.5B messages/day x 1,000-4,000 tokens/message (Epoch's full-context anchor); consumer only, API counted in the floor	link · archive
xAI	~7-10M DAU x 5-20 msgs x 1.5-3k tokens; revenue ~$0.5B mostly free so revenue-implied fails	link · archive
Meta	1B MAU (May 2025) x 8-13% DAU/MAU (~80-130M DAU) x 3-4 msgs x 1-2k tokens/message; Meta discloses users, never tokens	link

Research & academic

Source	Year	What it contributes
Epoch AI: AI Companies token-usage dataset (open CSV)	2026	Structured open dataset of per-company daily token figures with sources & confidence; corroborates our Google and OpenAI numbers
Epoch AI: Is a compute crunch coming?	2026	Estimates global throughput at ~5 billion tokens/sec (~432T/day) across all providers
Epoch AI: How many digital workers could OpenAI deploy?	2025	Compute-derived estimate of frontier token output (10–100T/day; GPT-5 ~19T/day)
Epoch AI: Frontier labs don't use most AI compute	2026	Global installed base ~15–16M H100-equivalents operational (end-2025)
Epoch AI: AI chip production / installed base	2026	Accelerator stock growing ~3.3x/year; NVIDIA >60% of total compute
NBER WP 34255: How People Use ChatGPT	2025	OpenAI-coauthored study: 700M weekly users sending ~2.5B messages/day
a16z + OpenRouter: State of AI: 100 Trillion Token Study	2025	Empirical study of 100T+ tokens of real LLM usage; LLM API market ~50T/day
State of AI empirical study (arXiv:2601.10088)	2026	Peer-archived version of the OpenRouter/a16z 100-trillion-token study
Stanford HAI: AI Index Report 2025	2025	Inference price fell ~280x (Nov 2022–Oct 2024); context for volume growth
Photons = Tokens (arXiv:2603.06630): a global token balance sheet	2026	Energy divided by Wh-per-token supply estimate (~6.5e17 tokens/yr capacity by 2028) with a 2024 demand anchor of ~1e12 to 1e13 tokens/day
Erdil / Epoch: Inference economics of language models (arXiv:2506.04645)	2025	Roofline per-GPU throughput building blocks (e.g. H100 memory-bound floor ~42 tokens/sec/GPU single-stream)
Epoch AI: How much energy does ChatGPT use?	2025	Tokens-per-message anchors: ~500 output tokens/query (measured avg ~269), ~4,000 full-context tokens/message; 0.3 Wh/query
Meta: Llama usage doubled May through July 2024	2024	Meta's only direct token-volume disclosure: Llama tokens served via CSP partners grew 10x Jan-Jul 2024 (growth multiples only, no absolute number)
NBER w34608: The Emerging Market for Intelligence	2026	API usage econometrics (OpenRouter + Azure) on token demand and price elasticity
Google: Measuring the Environmental Impact of AI at Google Scale (arXiv:2508.15734)	2025	0.24 Wh per median Gemini prompt; energy divided by prompt count, not tokens (no per-token figure)
Epoch AI: Computing capacity (installed FLOP)	2025	~20M H100-equivalents global compute stock, from revenue divided by chip price

Analyst & industry

Source	Year	What it contributes
Bond Capital: Trends in Artificial Intelligence (Mary Meeker)	2025	Inference cost down ~99.7% over two years; demand flywheel
Goldman Sachs: AI Agents Forecast to Boost Tech Cash Flow	2025	Forecasts ~24x token-demand growth by 2030 as agentic usage dominates
Tomasz Tunguz: The Token Race	2025	Cross-provider token tallies (Google
Tomasz Tunguz: Is Token Consumption Slowing Down?	2025	Google's additive monthly growth roughly halved mid-2025 (rate
Azeem Azhar: Exponential View (Magnitudes of intelligence)	2026	Source of the ~5 billion tokens/sec (~432T/day) global estimate; per-user token growth
OpenRouter / a16z: token usage by billing geography	2025	US ~47% / Europe ~18% / Asia ~13% of token spend; basis for the consumption-geography split in the country view
AI 2027: Compute Forecast	2025	Install-base-derived OpenAI token estimate: ~2T/day in 2024 scaling to ~80T/day by 2027
YipitData: cloud and LLM pricing trends	2025	Alt-data market-wide estimate of ~150T tokens/month (~2 quadrillion annualized)
Menlo Ventures: State of Generative AI in the Enterprise	2025	Enterprise survey (~495 US decision-makers); $37B enterprise AI spend; token usage up ~320x
OpenRouter: Series B announcement	2026	Platform processing 25T tokens/week (May 2026), up from 5T in six months
Morgan Stanley: AI market trends	2026	Uses token throughput (6.4T to 22.7T tokens/week) as a demand proxy; forecasts hardware in dollars, not token volume
Anthropic via Sentisight: GenAI usage by hour and day	2026	Weekday peak 8am-2pm ET (12-18 UTC) per Anthropic Claude usage; weekend visits down ~23%
Cloudflare Radar: AI Insights (time-of-day)	2026	Infrastructure-level GenAI traffic by time of day; weekdays consistently outpace weekends

Compute & hardware

Source	Year	What it contributes
NVIDIA: Blackwell leads on SemiAnalysis InferenceMAX	2025	B200 ~10
Artificial Analysis: Hardware benchmarks	2026	Per-GPU production throughput across models and accelerators
FlexPipe (arXiv:2510.11938): serving-pipeline efficiency	2025	GPU reservation falls ~75% to ~30% with better pipelining (a serving/allocation paper
Meta: The Llama 3 Herd of Models (arXiv:2407.21783)	2024	Best-case training MFU ~38–43% even on optimized 16K-H100 runs
Epoch AI via The Decoder: global AI compute ~15M H100e	2026	Global operational AI compute >15M H100-equivalents; >10 GW power
TokenPowerBench (arXiv:2512.03024): energy per token	2025	Whole-system ~40-60 J/token at high load on a 32xH100 cluster; GPU is >60% of energy
LMSYS: Large-scale expert parallelism (DeepSeek R1)	2025	~2,788 tokens/sec/GPU decode for DeepSeek-R1 671B MoE on H100
SemiAnalysis: InferenceMAX open-source inference benchmark	2025	Per-GPU throughput primitives (B200 ~60,000 tokens/sec/GPU); aggregate Tokenomics model is subscriber-gated

A note on comparability. These sources do not all measure the same thing: input vs output tokens, one provider vs all surfaces, marketplace samples vs global totals, multimodal vs text. We preserve what each source reported and flag the differences rather than forcing them into one number. See the methodology for how we normalize and what we exclude.