Skip to main content

LongCat-Flash-Omni Launches with Multimodal Breakthroughs

Meituan Unveils LongCat-Flash-Omni with Revolutionary Multimodal Capabilities

November 3, 2025 - Following the successful launch of its LongCat-Flash series in September, Meituan has now introduced LongCat-Flash-Omni, a groundbreaking multimodal AI model that sets new standards for real-time interaction across text, image, video, and speech modalities.

Technical Innovations

The model builds upon Meituan's efficient architecture with several key advancements:

  • Shortcut-Connected MoE (ScMoE) Technology: Enables efficient processing despite the model's massive 560 billion parameters (with 27 billion activated)
  • Integrated Multimodal Modules: Combines perception and speech reconstruction in an end-to-end design
  • Progressive Fusion Training: Addresses data distribution challenges across different modalities

Image

Performance Benchmarks

Independent evaluations confirm LongCat-Flash-Omni achieves:

  • State-of-the-art (SOTA) results in open-source multimodal benchmarks
  • No performance degradation when switching between modalities ("no intelligence reduction")
  • Superior real-time audio-video interaction with latency under industry standards
  • Exceptional scores in:
    • Text understanding (+15% over previous models)
    • Image recognition (98.7% accuracy)
    • Speech naturalness (4.8/5 human evaluation)

Developer Applications

The release includes multiple access channels:

  • Official app with voice call functionality (video coming soon)
  • Web interface supporting file uploads and multimodal queries
  • Open-source availability on Hugging Face and GitHub

Key Points

  • First open-source model to combine offline understanding with real-time AV interaction
  • Lightweight audio decoder enables natural speech reconstruction
  • Early fusion training prevents modality interference
  • Currently supports Chinese/English with more languages planned for Q1 2026

Enjoyed this article?

Subscribe to our newsletter for the latest AI news, product reviews, and project recommendations delivered to your inbox weekly.

Weekly digestFree foreverUnsubscribe anytime

Related Articles

Moonlight AI's Kiwi-do Model Stuns With Visual Physics Prowess
News

Moonlight AI's Kiwi-do Model Stuns With Visual Physics Prowess

Moonshot AI's mysterious new 'Kiwi-do' model has emerged as a potential game-changer in multimodal AI. Showing remarkable capabilities in visual physics comprehension, this freshly spotted model appears ahead of Moonshot's planned K2 series release. Early tests suggest Kiwi-do could revolutionize how AI interprets complex visual data.

January 5, 2026
multimodal-AIcomputer-visionMoonshot-AI
vLLM-Omni Bridges AI Modalities in One Powerful Framework
News

vLLM-Omni Bridges AI Modalities in One Powerful Framework

The vLLM team has unveiled vLLM-Omni, a groundbreaking framework that seamlessly combines text, image, audio, and video generation capabilities. This innovative solution treats different AI modalities as independent microservices, allowing flexible scaling across GPUs. Early benchmarks show significant performance gains over traditional approaches, potentially revolutionizing how developers build multimodal applications.

December 2, 2025
multimodal-AIvLLMdiffusion-models
Meituan LongCat Unveils UNO-Bench for Multimodal AI Evaluation
News

Meituan LongCat Unveils UNO-Bench for Multimodal AI Evaluation

Meituan's LongCat team has launched UNO-Bench, a comprehensive benchmark for evaluating multimodal large language models. The tool features 44 task types across five modality combinations, with a dataset of 1,250 full-modal samples showing 98% cross-modal solvability. The benchmark introduces innovative evaluation methods and focuses initially on Chinese-language applications.

November 6, 2025
AI-evaluationmultimodal-AIMeituan-LongCat
NVIDIA Open-Sources OmniVinci Multimodal AI Model
News

NVIDIA Open-Sources OmniVinci Multimodal AI Model

NVIDIA has open-sourced its breakthrough OmniVinci model, achieving superior multimodal understanding with just one-sixth the training data of competitors. The AI system integrates visual, audio, and text processing through innovative architecture.

October 28, 2025
multimodal-AINVIDIA-researchmachine-learning
ByteDance, HK Universities Open-Source DreamOmni2 AI Image Editor
News

ByteDance, HK Universities Open-Source DreamOmni2 AI Image Editor

ByteDance and Hong Kong universities have open-sourced DreamOmni2, a breakthrough AI image editing system that understands abstract concepts through multimodal instructions. The technology outperforms existing open-source models and approaches commercial solutions.

October 27, 2025
AI-image-editingmultimodal-AIopen-source-AI
LLaVA-OneVision-1.5 Outperforms Qwen2.5-VL in Benchmarks
News

LLaVA-OneVision-1.5 Outperforms Qwen2.5-VL in Benchmarks

The open-source community introduces LLaVA-OneVision-1.5, a groundbreaking multimodal model excelling in image and video processing. With a three-stage training framework and innovative data packaging, it surpasses Qwen2.5-VL in 27 benchmarks.

October 17, 2025
multimodal-AIopen-sourcecomputer-vision