Intelligence Per Dollar
Yesterday Microsoft added a new metric to a model release card, one that will likely become a standard.1
Average token usage.
In the first row, the Microsoft model hits 71.6 on SWE-Bench Verified using about a third of the tokens Claude Haiku 4.5 burns.
Benchmarks are now measured on two different dimensions, the overall performance & the cost to achieve that intelligence.