Last quarter, my AI inference costs hit $100,000 annualized.

I started small. Six months earlier, I was spending $200 a month on Claude. Then I added three agent subscriptions : Codex, Gemini, & Claude Code. I was paying $600 a month.

Next I started using AI to transform my todo list into my done list, increasing tasks to 31 per day. $92 daily inference invoices started arriving. Then $400 per month on browser agents.

Within two quarters, my inference spend grew from $7,200 to $43,000 to over $100,000 run rate.

So I migrated to an open source model. It took a weekend. The key was building the right testing loops : I had six months of historical task data, so I could replay requests through the new model & hill-climb to parity with AI agents working through the night. By Sunday evening, they performed identically. At 12% of the cost.

I’m not the only one paying attention to this cost.

Technology companies are adding a fourth component to engineering compensation : salary, bonus, options, & inference costs. Levels.fyi pegs the 75th percentile software engineer salary at $375k. Add $100k in inference & the fully loaded cost is $475k. That’s 21% in tokens.

The question CFOs will pose : what am I getting for all this inference spend? Can I do it cheaper?

If the metric for a new cloud is gross profit per GPU hour, the employee equivalent is : productive work per dollar of inference.

For me, the answer is 31 tasks a day at $12k annually. The engineer still burning $100k? They’d better be 8x more productive!

Will you be paid in tokens? In 2026, you likely will start to be.