Skip to main content

Small AI Models Surpass Larger Ones with New Training Method

Small AI Models Surpass Larger Ones with New Training Method

When the race for larger AI models makes computing power prohibitively expensive, a breakthrough technology called "On-Policy Distillation" is changing the game. Led by former OpenAI CTO Mira Murati at Thinking Machines Lab, this method allows smaller models to achieve performance levels previously reserved for much larger systems—at a fraction of the cost.

Efficiency Breakthrough: 8B Model Matches 32B Performance

Recent research shows that an 8 billion-parameter model, when trained with on-policy distillation, can achieve 70% of the performance of a 32 billion-parameter model. The training cost drops by 90%, while efficiency increases by 50 to 100 times. This development could democratize AI development, enabling small and medium enterprises as well as individual developers to train specialized models competitively.

Image

How It Works: Real-Time Feedback Revolutionizes Training

The key innovation lies in a "dense feedback per token" mechanism. Unlike traditional reinforcement learning (RL), which provides sparse rewards at the end of each episode, on-policy distillation allows the teacher model to provide real-time scores for every token generated by the student model. This continuous guidance:

  • Accelerates convergence
  • Prevents "policy drift" during long sequence training
  • Ensures consistent high-quality output from smaller models

In practical tests, the Qwen3-8B model achieved 70% accuracy on math reasoning tasks with just 150 training steps, compared to traditional RL methods requiring 17,920 GPU hours for similar results.

Solving Catastrophic Forgetting: Retaining Knowledge While Learning New Skills

One persistent challenge in AI has been "catastrophic forgetting"—where models lose previously learned abilities when acquiring new knowledge. Traditional fine-tuning might see instruction-following ability drop from 85% to 45% when incorporating new documentation.

On-policy distillation addresses this through:

  • Real-time trajectory sampling
  • Gradual teacher correction

The method retains 41% of new knowledge while quickly restoring original capabilities to 83%, significantly outperforming conventional approaches.

Implementation: Simple Four-Step Process

The method's lightweight architecture requires only four repeating steps:

  1. Deploy a teacher model (e.g., 32B) as supervision source
  2. Student model generates response trajectories
  3. Teacher calculates log probability for each token
  4. Optimize student parameters using reverse Kullback-Leibler divergence

The system works with existing distillation frameworks without complex infrastructure, enabling what researchers call a "cost-effective and accurate" performance leap.

Implications for AI Democratization

Murati's approach represents what industry experts call a "downgrade strike"—using smarter training methods rather than simply scaling up parameters. This has significant implications:

  • Makes high-performance AI accessible on mobile and IoT devices
  • Reduces reliance on cloud-based "AI monopolies"
  • Enables continuous model evolution without capability loss

The technology is particularly promising for enterprise applications where models need to dynamically learn business rules without sacrificing core functionality like basic conversation and tool calling.

Key Points:

  • 90% cost reduction in AI training
  • Small (8B) models achieve 70% performance of large (32B) models
  • Solves catastrophic forgetting while adding new knowledge
  • Simple implementation compatible with existing frameworks
  • Potential to democratize AI development across industries

Enjoyed this article?

Subscribe to our newsletter for the latest AI news, product reviews, and project recommendations delivered to your inbox weekly.

Weekly digestFree foreverUnsubscribe anytime

Related Articles

China's Wenxin ERNIE 5.0 Makes Global AI Waves With Math Breakthrough
News

China's Wenxin ERNIE 5.0 Makes Global AI Waves With Math Breakthrough

Baidu's latest AI model just turned heads worldwide. The newly released Wenxin ERNIE 5.0 has cracked the top ten in global rankings, scoring an impressive eighth place on the LMArena benchmark. Even more striking? Its math skills now rival OpenAI's unreleased GPT-5.2-High, marking a major leap forward for Chinese AI capabilities.

January 15, 2026
Artificial IntelligenceChinese TechMachine Learning
Baidu's ERNIE-5.0 Takes Global Math Crown Among AI Models
News

Baidu's ERNIE-5.0 Takes Global Math Crown Among AI Models

Baidu has unleashed its newest AI powerhouse - ERNIE-5.0-0110 - and it's turning heads worldwide. This Chinese-developed model isn't just keeping up with global competitors; it's leading in mathematics, ranking second only to GPT-5.2-High. Beyond number crunching, ERNIE shines in programming, specialized knowledge, and creative tasks, proving China's growing might in artificial intelligence.

January 15, 2026
AI DevelopmentChinese TechMachine Learning
News

GPT-5.2 Outshines Claude Opus in Browser-Building Marathon

In a groundbreaking test of AI programming endurance, OpenAI's GPT-5.2 has demonstrated remarkable stamina by successfully building a complete web browser from scratch - outperforming Anthropic's Claude Opus 4.5 in long-term engineering tasks. While both models excel at short coding sprints, GPT-5.2 showed superior ability to maintain focus over weeks-long projects, correcting errors and coordinating complex dependencies without losing sight of the end goal.

January 15, 2026
AI ProgrammingMachine LearningSoftware Engineering
AI Cracks Erdős' Toughest Puzzles: Mathematicians Stunned by GPT5.2's Breakthroughs
News

AI Cracks Erdős' Toughest Puzzles: Mathematicians Stunned by GPT5.2's Breakthroughs

In an unprecedented feat, GPT5.2 has solved 11 of Paul Erdős' legendary unsolved mathematical problems in just two weeks, verified by formal proof tools. The breakthrough has top mathematicians like Terry Tao taking notice, with Harvard's Noam Elkies building on AI-generated solutions. This marks a turning point where artificial intelligence isn't just assisting human researchers - it's making autonomous discoveries at the frontiers of pure mathematics.

January 15, 2026
Artificial IntelligenceMathematicsGPT5
India's Alpie AI Model Makes Waves - But Is It Truly Homegrown?
News

India's Alpie AI Model Makes Waves - But Is It Truly Homegrown?

A new AI contender from India, Alpie, is turning heads with performance that rivals industry giants like GPT-4o and Claude3.5. While its mathematical and coding capabilities impress, technical scrutiny reveals it's built on Chinese open-source technology. This cost-efficient model could democratize AI access, but raises questions about innovation origins in the global AI race.

January 15, 2026
AI InnovationMachine LearningTech Startups
DeepSeek-V4 Set to Revolutionize Code Generation This February
News

DeepSeek-V4 Set to Revolutionize Code Generation This February

DeepSeek is gearing up to launch its powerful new AI model, DeepSeek-V4, around Chinese New Year. The update promises major leaps in code generation and handling complex programming tasks, potentially outperforming competitors like Claude and GPT series. Developers can expect more organized responses and better reasoning capabilities from this innovative tool.

January 12, 2026
AI DevelopmentProgramming ToolsMachine Learning