Skip to main content

AI2's Molmo 2 Brings Open-Source Video Intelligence to Your Fingertips

A New Era of Open Video Intelligence

The Allen Institute for Artificial Intelligence (AI2) is shaking up the AI world again with its latest release: Molmo 2. This isn't just another language model - it's specifically designed to understand videos and images, and best of all, it's completely open-source.

Image

What's Under the Hood?

Molmo 2 comes in several flavors:

  • Molmo2-4B & Molmo2-8B: Built on Alibaba's Qwen3 foundation
  • Molmo2-O-7B: A fully transparent version using AI2's own Olmo architecture

The package includes nine new datasets covering everything from multi-image analysis to video tracking - essentially giving developers the building blocks to create custom video understanding systems.

Why This Matters for Businesses

Ranjay Krishna, who leads perception research at AI2, explains what sets Molmo 2 apart: "These models don't just answer questions - they can pinpoint exactly when and where events happen in videos." Imagine asking "When did the player score?" and getting not just the answer but the exact timestamp.

The models pack some impressive capabilities:

  • Generating detailed video descriptions
  • Counting objects across frames
  • Spotting rare events in long footage

The Open-Source Advantage

In an industry where most powerful models are locked behind corporate walls, AI2's commitment to openness stands out. As analyst Bradley Shimmin notes: "For companies worried about data sovereignty or needing custom solutions, having full access to model weights and training data is invaluable."

The relatively compact size (4B-8B parameters) makes Molmo 2 practical for real-world deployment. Shimmin adds: "Enterprises are realizing bigger isn't always better - what matters is having control and understanding of your AI tools."

Try It Yourself

Curious developers can test drive Molmo 2 on:

The complete project details are available at allenai.org/blog/molmo2.

Key Points:

  • Open access: Full model weights and training data available
  • Video smarts: Understands temporal events and spatial relationships
  • Developer friendly: Multiple size options balance capability with efficiency
  • Transparent AI: Complete visibility into how models were built

Enjoyed this article?

Subscribe to our newsletter for the latest AI news, product reviews, and project recommendations delivered to your inbox weekly.

Weekly digestFree foreverUnsubscribe anytime

Related Articles

News

Hume AI's TADA Brings Lightning-Fast, Hallucination-Free Speech to Your Phone

Hume AI has unveiled TADA, a groundbreaking text-to-speech system that runs efficiently on mobile devices. Unlike traditional models, it eliminates content hallucinations while delivering audio five times faster. What really sets it apart? The ability to generate 700-second audio clips and provide real-time transcriptions simultaneously - no extra processing needed. Early tests show it outperforms larger models in voice quality too.

March 12, 2026
AI speech synthesismobile technologyopen source AI
Xiaohongshu Unveils Faster AI Image Editor With Major Upgrades
News

Xiaohongshu Unveils Faster AI Image Editor With Major Upgrades

China's lifestyle platform Xiaohongshu has turbocharged its AI image editing capabilities with FireRed-Image-Edit v1.1. The update brings smarter facial recognition, smoother multi-element blending, and dramatic performance boosts - cutting processing time nearly in half. In a surprise move, the company is releasing all code and technical specs publicly, giving developers worldwide access to these professional-grade tools.

March 9, 2026
AI image editingXiaohongshucomputer vision
News

Peking University and OceanBase Break New Ground in Long Video Search Technology

Researchers from Peking University and OceanBase have developed LoVR, a groundbreaking benchmark for long video retrieval that tackles key industry challenges. Accepted by WWW 2026, this innovation enables precise searches across entire videos or specific segments through advanced semantic analysis. The system features over 40,000 finely annotated clips and addresses real-world problems like semantic drift in lengthy content.

March 2, 2026
video retrievalAI researchmultimodal technology
News

Hikvision's AI Inspector Tackles Factory Packaging Errors

Hikvision has unveiled a smart quality control system powered by its Guanlan AI model that spots packaging mistakes instantly. Unlike traditional manual checks, this solution scans every item with precision, adapting to complex production environments. Already proving valuable in automotive and electronics plants, it marks another step toward smarter manufacturing.

January 30, 2026
industrial automationquality controlcomputer vision
Kunlun Wanwei's Open-Source Video AI Takes Creativity to New Heights
News

Kunlun Wanwei's Open-Source Video AI Takes Creativity to New Heights

Chinese tech firm Kunlun Wanwei has unveiled SkyReels-V3, an open-source video generation model that's turning heads in the AI community. This versatile tool combines image-to-video conversion, cinematic-style extensions, and lifelike virtual avatars in one package. Early tests show it outperforms commercial rivals in visual quality and consistency. Best of all? It's free to use—for now.

January 29, 2026
AI video generationopen source AImultimodal models
News

Robots Get a Sense of Touch with Groundbreaking New Dataset

A major leap forward in robotics arrived this week with the release of Baihu-VTouch, the world's first cross-body visual-tactile dataset. Developed collaboratively by China's National-Local Co-built Humanoid Robot Innovation Center and multiple research teams, this treasure trove contains over 60,000 minutes of real robot interaction data. What makes it special? The dataset captures not just what robots see, but how objects feel - enabling machines to develop human-like tactile sensitivity across different hardware platforms.

January 27, 2026
roboticsAI researchtactile sensing