Skip to main content

AI2's Molmo 2 Brings Open-Source Video Intelligence to Your Fingertips

A New Era of Open Video Intelligence

The Allen Institute for Artificial Intelligence (AI2) is shaking up the AI world again with its latest release: Molmo 2. This isn't just another language model - it's specifically designed to understand videos and images, and best of all, it's completely open-source.

Image

What's Under the Hood?

Molmo 2 comes in several flavors:

  • Molmo2-4B & Molmo2-8B: Built on Alibaba's Qwen3 foundation
  • Molmo2-O-7B: A fully transparent version using AI2's own Olmo architecture

The package includes nine new datasets covering everything from multi-image analysis to video tracking - essentially giving developers the building blocks to create custom video understanding systems.

Why This Matters for Businesses

Ranjay Krishna, who leads perception research at AI2, explains what sets Molmo 2 apart: "These models don't just answer questions - they can pinpoint exactly when and where events happen in videos." Imagine asking "When did the player score?" and getting not just the answer but the exact timestamp.

The models pack some impressive capabilities:

  • Generating detailed video descriptions
  • Counting objects across frames
  • Spotting rare events in long footage

The Open-Source Advantage

In an industry where most powerful models are locked behind corporate walls, AI2's commitment to openness stands out. As analyst Bradley Shimmin notes: "For companies worried about data sovereignty or needing custom solutions, having full access to model weights and training data is invaluable."

The relatively compact size (4B-8B parameters) makes Molmo 2 practical for real-world deployment. Shimmin adds: "Enterprises are realizing bigger isn't always better - what matters is having control and understanding of your AI tools."

Try It Yourself

Curious developers can test drive Molmo 2 on:

The complete project details are available at allenai.org/blog/molmo2.

Key Points:

  • Open access: Full model weights and training data available
  • Video smarts: Understands temporal events and spatial relationships
  • Developer friendly: Multiple size options balance capability with efficiency
  • Transparent AI: Complete visibility into how models were built

Enjoyed this article?

Subscribe to our newsletter for the latest AI news, product reviews, and project recommendations delivered to your inbox weekly.

Weekly digestFree foreverUnsubscribe anytime

Related Articles

MIT's Automated 'Motion Factory' Teaches AI Physical Intuition
News

MIT's Automated 'Motion Factory' Teaches AI Physical Intuition

Researchers from MIT, NVIDIA, and UC Berkeley have cracked a major challenge in video analysis - teaching AI to understand physical motion. Their automated 'FoundationMotion' system generates high-quality training data without human input, helping AI systems grasp concepts like trajectory and timing with surprising accuracy. Early tests show it outperforms much larger models, marking progress toward machines that truly understand how objects move.

January 12, 2026
computer visionAI trainingmotion analysis
Chinese Researchers Teach AI to Spot Its Own Mistakes in Image Creation
News

Chinese Researchers Teach AI to Spot Its Own Mistakes in Image Creation

A breakthrough from Chinese universities tackles AI's 'visual dyslexia' - where image systems understand concepts but struggle to correctly portray them. Their UniCorn framework acts like an internal quality control team, catching and fixing errors mid-creation. Early tests show promising improvements in spatial accuracy and detail handling.

January 12, 2026
AI innovationcomputer visionmachine learning
News

Tech Veteran Launches liko.ai to Bring Smarter Privacy-Focused Home Cameras

Ryan Li, former Meituan hardware chief, has secured funding from SenseTime and iFLYTEK affiliates for his new venture liko.ai. The startup aims to revolutionize home security cameras with edge-based AI that processes video locally rather than in the cloud - addressing growing privacy concerns while adding smarter detection capabilities. Their first products are expected mid-2026.

January 7, 2026
smart homecomputer visionedge computing
News

Smart Home Startup liko.ai Lands Funding for Edge AI Vision

AI startup liko.ai has secured its first round of funding from prominent investors including SenseTime Guoxiang Capital and Oriental Fortune Sea. The company, led by smart hardware veteran Ryan Li, aims to transform home automation with edge-based vision-language models that process data locally rather than in the cloud. Their AI Home Center promises smarter, more private smart home experiences.

January 6, 2026
edge computingsmart homecomputer vision
ByteDance's StoryMem Gives AI Videos a Memory Boost
News

ByteDance's StoryMem Gives AI Videos a Memory Boost

ByteDance and Nanyang Technological University researchers have developed StoryMem, an innovative system tackling persistent issues in AI video generation. By mimicking human memory mechanisms, it maintains character consistency across scenes - a challenge even for models like Sora and Kling. The solution cleverly stores key frames as references while keeping computational costs manageable. Early tests show significant improvements in visual continuity and user preference scores.

January 4, 2026
AI video generationByteDancecomputer vision
ByteDance's StoryMem Brings Consistency to AI-Generated Videos
News

ByteDance's StoryMem Brings Consistency to AI-Generated Videos

ByteDance and Nanyang Technological University researchers have developed StoryMem, a breakthrough system tackling character consistency issues in AI video generation. By intelligently storing and referencing key frames, the technology maintains visual continuity across scenes - achieving 28.7% better consistency than existing models. While promising for storytelling applications, the system still faces challenges with complex multi-character scenes.

January 4, 2026
AI video generationByteDancecomputer vision