One trillion tokens per day. Is that a lot?
“And when we look narrowly at just the number of tokens served by Foundry APIs, we processed over 100t tokens this quarter, up 5x year over year, including a record 50t tokens last month alone.”
In April, Microsoft shared a statistic, revealing their Foundry product is processing about 1.7t tokens per month.
Yesterday, Vipul shared Together.ai is processing 2t of open-source inference daily.
In July, Google announced a staggering number :
“At I/O in May, we announced that we processed 480 trillion monthly tokens across our surfaces. Since then we have doubled that number, now processing over 980 trillion monthly tokens, a remarkable increase.”
| Company | Daily Tokens (trillions) | vs Microsoft | Date |
|---|---|---|---|
| 32.7 | 574x | July 2025 | |
| Together | 2.0 | 35x | September 2025 |
| Microsoft Foundry | 0.057 | 1x | April 2025 |
Google processes 32.7t daily, 16x more than Together & 574x more than Microsoft Foundry’s April volume.
From these figures, we can draw a few hypotheses :
- Open-source inference is a single-digit fraction of inference. It’s unclear what fraction of Google’s inference tokens are from their open source models like Gemma. But, if we assume Anthropic & OpenAI are 5t-10t tokens per day1 & all closed-source, plus Azure is roughly similar in size, then open-source inference is likely around 1-3% of total inference. 2
- Agents are early. Microsoft’s data point suggests the agents within GitHub, Visual Studio, Copilot Studio, & Microsoft Fabric contribute less than 1% of overall AI inference on Azure.
- With Microsoft expected to invest $80 billion compared to Google’s $85 billion in AI data center infrastructure this year, the AI inference workloads of each company should increase significantly both through hardware coming online & algorithmic improvements.
“Through software optimization alone, we are delivering 90% more tokens for the same GPU compared to a year ago.”
Microsoft is squeezing more digital lemonade from their GPUs & Google must also be doing similar.
When will we see the first 10t or 50t AI tokens processed per day? It can’t be far off now.
-
Estimates from thin air! ↩︎
-
Google & Azure at 33t tokens per day each, Together & 5 other neoclouds at roughly 2t tokens per day each, & Anthropic & OpenAI at 5t tokens per day, gives us 88t tokens per day. If we assume 5% of Google’s tokens are from open-source models, that’s 1.65t tokens per day, or roughly 1.9% of total inference. Again, very rough math. ↩︎