News | Allen Institute for AI Launches Molmo to Process Text and Images

Allen Institute for AI Launches Molmo to Process Text and Images

Published by: Insights Desk Released: Sep 27, 2024 Source: DemandTalk

Highlights:

Molmo was evaluated internally at the Allen Institute for AI using eleven benchmark tests in comparison to multiple proprietary large language models.
Text processing tasks are the main emphasis of the two models in the Llama 3.2 series.

Allen Institute for AI launched Molmo, a suite of open-source language models to process images and text content.

Meta Platforms Inc.’s Connect 2024 product event provided a setting for the launch. The company unveiled new mixed reality gadgets as well as its own open-source Llama 3.2 language model series. Similar to Molmo’s offerings, two of the models in the lineup incorporate multimodal processing features.

Machine learning research is the main emphasis of the nonprofit Allen Institute for AI, or Ai2. It is situated in Seattle. It exhibits four neural networks in the new Molmo model series. There are 72 billion parameters in the most sophisticated model, one billion in the hardware-efficient model, and seven billion in each of the other two models.

Not only do all four algorithms respond to natural language requests, but they also have multimodal processing capabilities. Molmo is able to recognize, count, and characterize things in a picture. In addition, the models can carry out associated activities such as providing an explanation of the data displayed in a chart.

Molmo was evaluated internally at the Allen Institute for AI using eleven benchmark tests in comparison to multiple proprietary large language models. A score of 81.2 was obtained by the Molmo version with 72 billion parameters, marginally surpassing OpenAI’s GPT-4o. Less than five points separated the two Molmo versions with seven billion parameters from the OpenAI model.

With one billion parameters, the smallest model in the series has processing power that is more constrained. However, according to the Allen Institute for AI, it can still perform better than some systems that have ten times as many parameters. The model is also small enough to function on a mobile device.

The dataset used for training the Molmo series is one factor in its processing power. The file had several hundred thousand photos, each with an extremely thorough explanation of the subjects it featured. The Allen Institute for AI claims that by examining those descriptions, Molmo was able to outperform larger models trained on lower-quality data on item recognition challenges.

“We take a vastly different approach to sourcing data with an intense focus on data quality, and are able to train powerful models with less than 1M image text pairs, representing 3 orders of magnitude less data than many competitive approaches,” Molmo developers reported.

The event featured the release of Llama 3.2, a new family of Meta language models, and the introduction of the algorithm series. The roster includes four open-source neural networks, just like Molmo.

The first two models house nine billion and 11 billion parameters, respectively. Their multimodal architecture enables them to process pictures in addition to words. According to Meta, the models are equally accurate at image recognition tests as GPT4o-mini, which is a condensed form of GPT-4o.

Text processing is the focus of the other two models in the Llama 3.2 series. With three billion parameters, the more sophisticated of the two has roughly a third as many as the other. According to Meta, the models can perform better on a variety of tasks than algorithms of similar size.

il est temps de devenir sérieux avec le genai dan...

harnessing ai: the future of business transformati...

prepare for the future now. achieve greater, secur...

stay ahead with modern technology...

stay ahead...

workforce upskilling for the ai era...

unlock the full potential of generative ai at work...

ai pcs are quickly becoming the key to achieving s...

developing tomorrow’s ai on today’s ai-ready w...

unveiling ai-level productivity...

the new cyber security opportunity in an ‘ai eve...

how ai is changing managed detection and response...

answering your 4 biggest questions about generativ...

understanding the costs of generative ai...

the top 5 generative ai questions on every executi...

7 leading generative ai use cases...

6 steps to success with generative...

revolutionize your product launches with ai-driven...

unlock the full potential of ai-powered software d...

new era energy efficiency whitepaper longform...

compliance automation: a strategic investment for ...

leading the way: how modern workplaces embrace cha...

choosing the right ai foundation model for your ne...

ai governance: the path to responsible ai...

ai in market research: new possibilities, new insi...

ai ready workforce: upskilling for the ai era...

ai pricing strategy: the key to sustainable busine...

ai in business strategy: enhancing decisions boos...

genai at work: revolutionizing modern business ope...

ai misinformation: ai’s role in amplifying misin...

decision intelligence empowering business actions ...

committee machine in ml harnessing ensemble techni...

information processing language serves scalable an...

ai agents in business: transforming operations dr...

ai adoption framework: key components for effectiv...

machine learning use cases that deliver tangible r...

profitable ai-powered data management solutions to...

business-centric cognitive architecture revolution...

ai use cases – innovations for business success...

the role of ai in software development...

openai upgrades chatgpt’s image generation tool ...

microsoft is improving security copilot service wi...

deepseek unveils enhanced v3 model under mit licen...

nvidia reportedly acquires gretel to generate arti...

dataminr raises usd 85 m for real-time analytics...

ai code review startup graphite raises usd 52 m to...

zoom upgrades with agentic ai for enhanced video c...

google introduces gemini robotics and gemini robot...

google launches next-gen lightweight gemma ai mode...

ai21 labs introduces maestro for enhancing llm qua...

servicenow to acquire moveworks in a usd 2.9 b...

qualcomm acquires edge impulse, edge ai startup...

google introduces two new ai features to enhance i...

coreweave plans to buy weight biases for seamless...

openai launches nextgenai consortium with 15 insti...

anthropic pbc raises usd 3.5 b at usd 61.5 b value...

openai introduces gpt-4.5 as the most advanced and...

amazon launches alexa , an llm-powered assistant...

perplexity ai is creating a browser for ‘agentic...

mongodb acquires voyage ai for ai models generatin...

role of machine learning in networking...

Allen Institute for AI Launches Molmo to Process Text and Images

Insights Desk

Related posts

OpenAI Upgrades ChatGPT’s Image Generation Tool ...

Microsoft is Improving Security Copilot Service wi...

DeepSeek Unveils Enhanced V3 Model Under MIT Licen...

Nvidia Reportedly Acquires Gretel to Generate Arti...

Dataminr Raises USD 85 M for Real-time Analytics...

AI Code Review Startup Graphite Raises USD 52 M to...

Zoom Upgrades with Agentic AI for Enhanced Video C...

Google Introduces Gemini Robotics and Gemini Robot...

Google Launches Next-Gen Lightweight Gemma AI Mode...

AI21 Labs Introduces Maestro for Enhancing LLM Qua...

Our Brands