Teaching Local Models to Call Tools Like Claude

Ten months ago, DeepSeek collapsed AI training costs by 90% using distillation - transferring knowledge from larger models to smaller ones at a fraction of the cost.

Distillation works like a tutor training a student : a large model teaches a smaller one.1 As we’ve shifted from knowledge retrieval to agentic systems, we wondered if there was a parallel technique for tool calling.2

Could a large model teach a smaller one to call the right tools?

Read more

From Knowledge to Action

GPT-5 launched yesterday. 94.6% on AIME 2025. 74.9% on SWE-bench.

As we approach the upper bounds of these benchmarks, they die.

What makes GPT-5 and the next generation of models revolutionary isn’t their knowledge. It’s knowing how to act. For GPT-5 this happens at two levels. First, deciding which model to use. But second, and more importantly, through tool calling.

We’ve been living in an era where LLMs mastered knowledge retrieval & reassembly. Consumer search & coding, the initial killer applications, are fundamentally knowledge retrieval challenges. Both organize existing information in new ways.

Read more