Skip to main content

Alibaba's New AI Training Method Promises More Stable, Powerful Language Models

Alibaba Breakthrough Makes AI Training More Reliable

In the fast-moving world of artificial intelligence, Alibaba's Tongyi Qwen research team has developed a potentially game-changing approach to training large language models. Their new Soft Adaptive Policy Optimization (SAPO) method addresses one of the field's persistent headaches: keeping these complex systems stable during the crucial learning phase.

Image

The Problem With Current Methods

Traditional approaches like GRPO and GSPO rely on what experts call "hard clipping" - essentially putting strict limits on how much the AI can adjust its learning parameters at once. While this prevents disastrous mistakes, it comes with significant drawbacks. Imagine trying to learn piano while wearing thick gloves; you won't break anything, but you'll miss subtle nuances in your playing.

"The existing methods often throw out valuable learning opportunities," explains Dr. Li Wei, lead researcher on the project. "If one part of a sequence performs poorly, current systems might discard the entire thing - like rejecting a whole essay because of one awkward sentence."

How SAPO Works Differently

The Qwen team's solution replaces these blunt-force restrictions with something more sophisticated. SAPO uses:

  • Smart filtering: Instead of hard cutoffs, it employs smooth, adjustable thresholds that preserve more useful information
  • Asymmetric handling: It treats positive and negative learning signals differently for better efficiency
  • Context awareness: The system makes decisions at both the sequence and individual token levels

This approach maintains stability while allowing models to learn from more of their experiences. Early testing shows particular promise for mixture-of-experts models - the complex architectures powering today's most advanced AI systems.

Real-World Performance Gains

The proof came in rigorous testing across multiple domains:

  • Math problems: SAPO-powered models solved 15% more complex equations correctly
  • Coding tasks: Generated code showed fewer errors and better structure
  • Logical reasoning: Demonstrated more consistent performance on tricky word problems
  • Multimodal challenges: Combined text and visual information more effectively

"What excites us most is how broadly applicable these improvements are," notes Dr. Li. "From technical applications to creative tasks, we're seeing better results across the board."

The team has published their findings in detail (paper link: https://arxiv.org/abs/2511.20347), inviting peer review and collaboration from the global AI community.

Key Points:

  • Alibaba's SAPO method offers a smarter way to train large language models
  • Replaces crude "hard clipping" with nuanced, adaptive controls
  • Preserves valuable learning signals while maintaining stability
  • Shows measurable improvements across diverse AI applications
  • Particularly effective for complex mixture-of-experts architectures

Enjoyed this article?

Subscribe to our newsletter for the latest AI news, product reviews, and project recommendations delivered to your inbox weekly.

Weekly digestFree foreverUnsubscribe anytime

Related Articles

MiniMax's Speed Boost: New Model Delivers Blazing-Fast AI Performance
News

MiniMax's Speed Boost: New Model Delivers Blazing-Fast AI Performance

MiniMax's latest M2.5-highspeed model is turning heads with its impressive performance leap. Clocking in at three times faster than competitors, this upgrade promises smoother AI experiences across more than 50 integrated platforms. Alongside the speed boost, MiniMax introduces flexible pricing plans and referral discounts, making advanced AI tools more accessible.

February 16, 2026
AI accelerationMiniMaxtech innovation
News

Baidu Qianfan's New Coding Plan: Free AI Assistance for Developers

Baidu Qianfan has launched its Coding Plan, a subscription-free AI coding service that integrates top models like GLM-4.7 and DeepSeek-V3.2. This innovative platform offers full lifecycle code support, from writing to optimization, with seamless model switching. It's designed to make AI programming more accessible for both enterprises and individual developers, transforming AI from an occasional tool to a daily coding companion.

February 12, 2026
AI developmentprogramming toolsBaidu Qianfan
News

Flapping Airplanes Lands $180M to Teach AI Like Humans

AI startup Flapping Airplanes just scored $180 million in seed funding from top investors like Sequoia Capital. Unlike typical AI labs that rely on massive data scraping, this team wants machines to learn smarter - not harder - by mimicking human brain efficiency. Their ambitious goal? Make AI training 1000 times more data-efficient.

February 11, 2026
AI startupsmachine learningventure capital
News

Robots Get a Brain Upgrade: ForceGPT Unveils Game-Changing AI Model

In a major leap for robotics, ForceGPT has introduced DM0 - the world's first embodied native AI model designed specifically for robots. Unlike previous adaptations of language models, DM0 was built from the ground up to give machines intuition and learning capabilities. The surprisingly efficient 2.4 billion-parameter model topped global performance rankings while being accessible enough to run on consumer-grade hardware. With full open-source availability and companion development tools, this release could democratize robot development much like Android did for smartphones.

February 11, 2026
roboticsartificial intelligencemachine learning
Alibaba's Qwen3.5 AI Model Nears Release with Vision-Language Capabilities
News

Alibaba's Qwen3.5 AI Model Nears Release with Vision-Language Capabilities

Alibaba's next-generation AI model Qwen3.5 appears ready for launch, with code appearing in the HuggingFace repository. The model reportedly features a hybrid attention mechanism and may debut as a native vision-language model (VLM). Developers have spotted references to both a compact 2B dense model and a more powerful 35B-A3B MoE variant. If current rumors hold true, Chinese New Year celebrations might coincide with this significant open-source release in the AI community.

February 9, 2026
AIMachine LearningAlibaba
News

China Unveils Massive 30,000-Card AI Supercluster

China has taken a giant leap in AI computing power with the launch of its first 30,000-card supercluster at Zhengzhou's National Supercomputing Internet hub. This massive computing pool, developed by Sunway in record time, supports trillion-parameter models and promises revolutionary breakthroughs across scientific fields. The system's open architecture makes it surprisingly accessible while offering unprecedented scalability.

February 6, 2026
AI infrastructurehigh-performance computingChina tech