The Robotic Tortoise & the Robotic Hare

I set up a race today between two robots.

My Mac on the left vs Claude Code on the right. Both tasked with building a payment app on Stripe’s new Tempo blockchain. Same prompts, same task, side by side.

Opus 4.5 is about 20% smarter than Qwen 35B on benchmarks. And it’s likely 50x larger. The hare should have won. It didn’t.

The local model finished in 2 minutes. Claude took over 6. I asked Claude to score both outputs : local model 6.5, Claude 4.5.¹

Video plays at 2x speed.

With 3x faster responses, I could add an extra cycle : “critique the plan and address the critiques.” In the time the hare was still thinking, the tortoise ran another lap.

Prompt	Local (Qwen 35B)	Claude (Opus 4.5)
Research Tempo & create plan	20.9s	55s
Critique the plan	16.5s	1m 35s
Which language is best?	16.5s	1m 35s
Research feedback online	48.9s	2m 35s
Save implementation plan	15.4s	44s
Total	~2 min	~6 min 24s

Faster responses mean more rounds of revision before a meeting ends or attention drifts. It’s different for agentic coding workflows & complex codebases, where slower work may lead to better outcomes. But for everyday tasks, faster models can enable tighter feedback loops. Tighter loops can produce better outcomes.

We don’t always need the smartest AI to get the job done.

Full screenshot of the race | Video ↩︎