Skip to main content

Shanghai Researchers Boost AI Reflection Capabilities

Shanghai Team Advances AI Reasoning Capabilities

Researchers from Shanghai Jiao Tong University and the Shanghai Artificial Intelligence Laboratory have made significant progress in enhancing the reflective abilities of multimodal large models (MLLMs). Their innovative MM-HELIX project addresses a critical limitation in current AI systems - the inability to effectively backtrack and reconsider approaches when facing complex challenges.

The Reflection Challenge in AI

While MLLMs demonstrate impressive capabilities in solving complex problems, they often exhibit "rigid" behavior during reasoning processes. Unlike humans who can reflect on their approach after encountering obstacles, current models struggle with this metacognitive ability. This limitation becomes particularly evident when handling tasks requiring multiple solution attempts or adaptive strategies.

Image

Building MM-HELIX: A Comprehensive Solution

The research team took a three-pronged approach:

  1. The Ultimate Exam Benchmark: Developed to evaluate reflective reasoning across 42 highly complex tasks spanning algorithms, graph theory, puzzles, and strategy games.
  2. MM-HELIX-100K Dataset: Contains 100,000 high-quality samples teaching models reflection through "step-by-step heuristic response generation" (SERG).
  3. Adaptive Hybrid Policy Optimization (AHPO): An intelligent tutoring algorithm that gradually shifts models from expert guidance to independent exploration.

The benchmark tests revealed even state-of-the-art models performed poorly on reflective tasks, particularly under multimodal input conditions.

Image

Measurable Improvements

The implementation showed promising results:

  • The SERG process reduced problem-solving time significantly while minimizing redundant thinking
  • Models equipped with MM-HELIX demonstrated stronger generalization capabilities
  • The Qwen2.5-VL-7B model achieved an 18.6% accuracy increase on benchmark tests

Key Points:

  • Current MLLMs lack effective reflection capabilities for complex reasoning tasks
  • MM-HELIX provides tools for evaluation (benchmark), training (dataset), and optimization (algorithm)
  • The system mimics human learning progression from guided to independent problem-solving
  • Demonstrated performance improvements validate the approach's effectiveness

Enjoyed this article?

Subscribe to our newsletter for the latest AI news, product reviews, and project recommendations delivered to your inbox weekly.

Weekly digestFree foreverUnsubscribe anytime

Related Articles

Google's Gemini Embedding 2 Bridges the Gap Between Machines and Human Understanding
News

Google's Gemini Embedding 2 Bridges the Gap Between Machines and Human Understanding

Google has unveiled Gemini Embedding 2, its first native multimodal embedding model that can process text, images, videos, audio, and documents simultaneously. Unlike generative models focused on content creation, this breakthrough technology helps machines truly 'understand' complex data by mapping diverse media types into unified mathematical spaces. With support for over 100 languages and combined media inputs, it promises significant improvements in search accuracy, legal research, and AI-powered analysis across industries.

March 11, 2026
AI innovationmultimodal learningmachine understanding
News

Peking University and OceanBase Break New Ground in Long Video Search Technology

Researchers from Peking University and OceanBase have developed LoVR, a groundbreaking benchmark for long video retrieval that tackles key industry challenges. Accepted by WWW 2026, this innovation enables precise searches across entire videos or specific segments through advanced semantic analysis. The system features over 40,000 finely annotated clips and addresses real-world problems like semantic drift in lengthy content.

March 2, 2026
video retrievalAI researchmultimodal technology
News

Robots Get a Sense of Touch with Groundbreaking New Dataset

A major leap forward in robotics arrived this week with the release of Baihu-VTouch, the world's first cross-body visual-tactile dataset. Developed collaboratively by China's National-Local Co-built Humanoid Robot Innovation Center and multiple research teams, this treasure trove contains over 60,000 minutes of real robot interaction data. What makes it special? The dataset captures not just what robots see, but how objects feel - enabling machines to develop human-like tactile sensitivity across different hardware platforms.

January 27, 2026
roboticsAI researchtactile sensing
Robots Get a Sense of Touch: Groundbreaking Dataset Bridges Vision and Feeling
News

Robots Get a Sense of Touch: Groundbreaking Dataset Bridges Vision and Feeling

Scientists have unveiled Baihu-VTouch, the world's most comprehensive dataset combining robotic vision and touch. This collection spans over 60,000 minutes of interactions across various robot types, capturing delicate contact details with remarkable precision. The breakthrough could revolutionize how robots handle delicate tasks - imagine machines that can actually 'feel' what they're doing.

January 26, 2026
roboticsAI researchtactile sensors
News

AI cracks famous math puzzle with a fresh approach

OpenAI's latest model has made waves in mathematics by solving a long-standing number theory problem. The solution to the Erdős problem caught the attention of Fields Medalist Terence Tao, who praised its originality. But behind this success lies a sobering reality - AI's overall success rate in solving such problems remains low, reminding us that these tools are assistants rather than replacements for human mathematicians.

January 19, 2026
AI researchmathematicsmachine learning
AI's Scientific Breakthrough: How FrontierScience Tests the Next Generation of Research Assistants
News

AI's Scientific Breakthrough: How FrontierScience Tests the Next Generation of Research Assistants

Artificial intelligence is making waves in scientific research, but how do we measure its true reasoning capabilities? The new FrontierScience benchmark puts AI models through rigorous testing in physics, chemistry, and biology. Early results show GPT-5.2 leading the pack, though human scientists still outperform when it comes to open-ended problem solving. This development could reshape how research gets done in labs worldwide.

December 17, 2025
AI researchscientific computingmachine learning benchmarks