Skip to main content

Meituan's New AI Model Packs Big Performance in Small Package

Meituan's Compact AI Model Delivers Outsized Performance

Image

In the world of AI models, bigger hasn't always meant better. Traditional Mixture of Experts (MoE) architectures often hit diminishing returns as they scale up expert counts. Meituan's LongCat team flipped this script with their new LongCat-Flash-Lite model, achieving remarkable results through an innovative approach they call "Embedding Expansion."

Rethinking How Models Scale

The breakthrough came when researchers discovered something counterintuitive: expanding embedding layers could outperform simply adding more experts. The numbers tell the story - while the full model contains 68.5 billion parameters, each inference activates just 2.9 to 4.5 billion parameters thanks to clever N-gram embedding layers.

"We've allocated over 30 billion parameters specifically to embedding," explains the technical report. "This lets us capture local semantics precisely - crucial for recognizing specialized contexts like programming commands."

Image

Engineering Efficiency at Every Level

Theoretical advantages don't always translate to real-world performance. Meituan addressed this through three key optimizations:

  1. Smart Parameter Use: Nearly half (46%) of parameters go to embedding layers, keeping computational growth manageable.
  2. Custom Hardware Tricks: Specialized caching (similar to KV Cache) and fused CUDA kernels slash I/O delays.
  3. Predictive Processing: A three-step speculative decoding approach expands batch sizes efficiently.

The result? Blazing speeds of 500-700 tokens per second handling substantial inputs (4K tokens) with outputs up to 1K tokens - all supporting contexts as long as 256K tokens.

Benchmark-Busting Performance

The proof comes in testing where LongCat-Flash-Lite punches above its weight:

  • Excels at practical applications like telecom support and retail scenarios on τ²-Bench
  • Shows particular strength in coding (54.4% on SWE-Bench) and command execution (33.75 on TerminalBench)
  • Holds its own generally (85.52 MMLU score) against larger models like Gemini2.5Flash-Lite

The entire package - weights, technical documentation, and SGLang-FluentLLM inference engine - is now open source through Meituan's LongCat API Open Platform, offering developers generous daily testing allowances.

Enjoyed this article?

Subscribe to our newsletter for the latest AI news, product reviews, and project recommendations delivered to your inbox weekly.

Weekly digestFree foreverUnsubscribe anytime

Related Articles

News

Alibaba's Tiny AI Model Takes On GPT-4o – And Wins

In a surprising turn of events, Alibaba's compact Qwen 3.5 model with just 4 billion parameters has outperformed OpenAI's massive GPT-4o in independent testing. This breakthrough challenges the industry's obsession with ever-larger models, proving that smarter architecture can trump sheer size. The achievement opens new possibilities for running powerful AI locally on everyday devices.

March 9, 2026
AI innovationMachine learningChinese tech
News

Google's AI Turns News Reports into Flood Warnings for Vulnerable Regions

Google has developed an innovative flood prediction system by analyzing millions of news articles with its Gemini AI. The technology transforms qualitative reports into quantitative data, creating early warnings for areas lacking traditional weather monitoring. Already implemented in 150 countries, this approach marks a breakthrough in using language models for disaster prevention while addressing global inequality in weather forecasting capabilities.

March 13, 2026
AI innovationdisaster preventionclimate technology
News

NVIDIA's Nemotron 3 Super shakes up AI with open-source power rivaling top models

NVIDIA has unleashed Nemotron 3 Super, a groundbreaking open-source AI model that's turning heads with performance nearly matching premium closed-source alternatives like GPT-5.4. This 120-billion-parameter powerhouse combines innovative architecture with practical efficiency, delivering triple the reasoning speed while maintaining impressive accuracy. Already adopted by major tech players, it could democratize access to high-performance AI tools.

March 12, 2026
AI developmentOpen-source technologyNVIDIA
Google's Gemini Embedding 2 Bridges the Gap Between Machines and Human Understanding
News

Google's Gemini Embedding 2 Bridges the Gap Between Machines and Human Understanding

Google has unveiled Gemini Embedding 2, its first native multimodal embedding model that can process text, images, videos, audio, and documents simultaneously. Unlike generative models focused on content creation, this breakthrough technology helps machines truly 'understand' complex data by mapping diverse media types into unified mathematical spaces. With support for over 100 languages and combined media inputs, it promises significant improvements in search accuracy, legal research, and AI-powered analysis across industries.

March 11, 2026
AI innovationmultimodal learningmachine understanding
News

NVIDIA shakes up AI with open-source NemoClaw platform

NVIDIA is making waves with its new open-source AI agent platform NemoClaw, breaking free from hardware dependencies. Meanwhile, China celebrates a milestone in industrial communication standards, and Apple gears up for its foldable iPhone launch with boosted production targets. The tech world is buzzing with innovation as these developments signal major shifts across industries.

March 11, 2026
AI innovationtech trendsopen source
News

Shenzhen Hosts Lobster Feast with AI Twist to Boost Tech Adoption

Longgang District teams up with AI firm Kimi for an unforgettable culinary-tech fusion event. On March 14th, attendees will witness robots cooking lobster while enjoying free samples, all while learning about OpenClaw deployment. The festival offers practical benefits too - from free installation services to API discounts for businesses embracing AI transformation.

March 10, 2026
AI innovationculinary techShenzhen events