Optimization

Teaching Local Models to Call Tools Like Claude

November 13, 2025

Ten months ago, DeepSeek collapsed AI training costs by 90% using distillation - transferring knowledge from larger models to smaller ones at a fraction of the cost.

Distillation works like a tutor training a student : a large model teaches a smaller one.¹ As we’ve shifted from knowledge retrieval to agentic systems, we wondered if there was a parallel technique for tool calling.²

Could a large model teach a smaller one to call the right tools?