Last quarter, my AI inference costs hit $100,000 annualized.
I started small. Six months earlier, I was spending $200 a month on Claude. Then I added three agent subscriptions : Codex, Gemini, & Claude Code. I was paying $600 a month.
Next I started using AI to transform my todo list into my done list, increasing tasks to 31 per day. $92 daily inference invoices started arriving. Then $400 per month on browser agents.
Within two quarters, my inference spend grew from $7,200 to $43,000 to over $100,000 run rate.
So I migrated to an open source model. It took a weekend. The key was building the right testing loops : I had six months of historical task data, so I could replay requests through the new model & hill-climb to parity with AI agents working through the night. By Sunday evening, they performed identically. At 12% of the cost.
I’m not the only one paying attention to this cost.
Technology companies are adding a fourth component to engineering compensation : salary, bonus, options, & inference costs. Levels.fyi pegs the 75th percentile software engineer salary at $375k. Add $100k in inference & the fully loaded cost is $475k. That’s 21% in tokens.
The question CFOs will pose : what am I getting for all this inference spend? Can I do it cheaper?
If the metric for a new cloud is gross profit per GPU hour, the employee equivalent is : productive work per dollar of inference.
For me, the answer is 31 tasks a day at $12k annually. The engineer still burning $100k? They’d better be 8x more productive!
Will you be paid in tokens? In 2026, you likely will start to be.