Skip to main content

Meituan LongCat Unveils UNO-Bench for Multimodal AI Evaluation

Meituan's LongCat Team Introduces Groundbreaking AI Evaluation Tool

Beijing, November 6, 2025 - Meituan's LongCat research team has unveiled UNO-Bench, a revolutionary benchmark designed to systematically evaluate multimodal large language models (MLLMs). This new tool represents a significant advancement in assessing AI systems' ability to understand and process information across different modalities.

Comprehensive Evaluation Framework

The benchmark covers 44 distinct task types and five modality combinations, providing researchers with unprecedented tools to measure model performance in both single-modal and full-modal scenarios. According to the development team, UNO-Bench was created to address the growing need for standardized evaluation metrics as multimodal AI systems become increasingly sophisticated.

Image

Robust Dataset Design

At the core of UNO-Bench lies its meticulously curated dataset:

  • 1,250 full-modal samples with 98% cross-modal solvability
  • 2,480 enhanced single-modal samples optimized for real-world applications
  • Special emphasis on Chinese-language context performance
  • Automated compression processing resulting in 90% faster runtime speeds

The dataset maintains an impressive 98% consistency rate when tested against 18 public benchmarks, demonstrating its reliability for research purposes.

Innovative Evaluation Methodology

UNO-Bench introduces several groundbreaking features:

  • Multi-step open-ended question format for assessing complex reasoning capabilities
  • General scoring model capable of automatically evaluating six different question types
  • Achieves 95% accuracy rate in automated evaluations

Image

Future Development Plans

While currently focused on Chinese-language applications, the LongCat team is actively seeking international partners to develop:

  • English-language version
  • Multilingual adaptations

The complete UNO-Bench dataset is now available for download via the Hugging Face platform, with related code and documentation accessible on GitHub.

Key Points:

  1. UNO-Bench evaluates multimodal AI across 44 tasks and 5 modality combinations
  2. Features curated dataset with 98% cross-modal solvability
  3. Introduces innovative multi-step question format
  4. Currently focused on Chinese with plans for English/multilingual versions
  5. Available now on Hugging Face and GitHub

Enjoyed this article?

Subscribe to our newsletter for the latest AI news, product reviews, and project recommendations delivered to your inbox weekly.

Weekly digestFree foreverUnsubscribe anytime

Related Articles

Moonlight AI's Kiwi-do Model Stuns With Visual Physics Prowess
News

Moonlight AI's Kiwi-do Model Stuns With Visual Physics Prowess

Moonshot AI's mysterious new 'Kiwi-do' model has emerged as a potential game-changer in multimodal AI. Showing remarkable capabilities in visual physics comprehension, this freshly spotted model appears ahead of Moonshot's planned K2 series release. Early tests suggest Kiwi-do could revolutionize how AI interprets complex visual data.

January 5, 2026
multimodal-AIcomputer-visionMoonshot-AI
vLLM-Omni Bridges AI Modalities in One Powerful Framework
News

vLLM-Omni Bridges AI Modalities in One Powerful Framework

The vLLM team has unveiled vLLM-Omni, a groundbreaking framework that seamlessly combines text, image, audio, and video generation capabilities. This innovative solution treats different AI modalities as independent microservices, allowing flexible scaling across GPUs. Early benchmarks show significant performance gains over traditional approaches, potentially revolutionizing how developers build multimodal applications.

December 2, 2025
multimodal-AIvLLMdiffusion-models
LongCat-Flash-Omni Launches with Multimodal Breakthroughs
News

LongCat-Flash-Omni Launches with Multimodal Breakthroughs

Meituan's LongCat team has released LongCat-Flash-Omni, a cutting-edge multimodal AI model featuring 560B parameters and real-time audio-video interaction capabilities. The model achieves state-of-the-art performance across text, image, and speech tasks while maintaining low latency through innovative ScMoE architecture.

November 3, 2025
multimodal-AIreal-time-interactionScMoE
NVIDIA Open-Sources OmniVinci Multimodal AI Model
News

NVIDIA Open-Sources OmniVinci Multimodal AI Model

NVIDIA has open-sourced its breakthrough OmniVinci model, achieving superior multimodal understanding with just one-sixth the training data of competitors. The AI system integrates visual, audio, and text processing through innovative architecture.

October 28, 2025
multimodal-AINVIDIA-researchmachine-learning
ByteDance, HK Universities Open-Source DreamOmni2 AI Image Editor
News

ByteDance, HK Universities Open-Source DreamOmni2 AI Image Editor

ByteDance and Hong Kong universities have open-sourced DreamOmni2, a breakthrough AI image editing system that understands abstract concepts through multimodal instructions. The technology outperforms existing open-source models and approaches commercial solutions.

October 27, 2025
AI-image-editingmultimodal-AIopen-source-AI
LLaVA-OneVision-1.5 Outperforms Qwen2.5-VL in Benchmarks
News

LLaVA-OneVision-1.5 Outperforms Qwen2.5-VL in Benchmarks

The open-source community introduces LLaVA-OneVision-1.5, a groundbreaking multimodal model excelling in image and video processing. With a three-stage training framework and innovative data packaging, it surpasses Qwen2.5-VL in 27 benchmarks.

October 17, 2025
multimodal-AIopen-sourcecomputer-vision