News | Meta Releases Four Open-source Language Models

Meta Releases Four Open-source Language Models

Published by: Insights Desk Released: Jul 05, 2024 Source: DemandTalk

Highlights:

Meta’s researchers suggest that generating output in batches of four tokens at a time may help overcome the limitations associated with the teacher-forcing approach.
Meta has developed four new models specifically for tasks related to code generation, with each model featuring 7 billion parameters.

Meta Platforms Inc. has released four open-source language models that utilize a cutting-edge machine-learning technique called multi-token prediction.

A leading media house has recently announced the launch of these models. Meta has shared their code on HuggingFace, a widely used platform for hosting AI projects.

Large language models produce one token at a time, either as text or code. A token represents a condensed form of characters within data. The new open-source models from Meta, on the other hand, produce four tokens at once. The company believes that using a processing technique called multi-token prediction can make LLMs more accurate and faster.

Meta has introduced four new models designed for code generation tasks, each boasting 7 billion parameters. Two of these models were trained on 200 billion tokens of code samples, while the other two were trained on 1 trillion tokens each. Additionally, Meta revealed in a paper accompanying the models that they have developed a fifth large language model, yet to be released, which features 13 billion parameters.

Internally, each model consists of two primary components. The first is a shared trunk responsible for the initial computations required to generate a code snippet. Meta explains that a series of output heads manage the subsequent steps of the code generation process. These four output heads individually produce one token at a time, allowing Meta’s models to generate four tokens simultaneously.

The reason why this approach yields higher-quality code compared to traditional LLM designs remains uncertain. Meta’s researchers suggest in their paper that this could be attributed to the specific architecture and construction of language models.

Developers often employ a method called teacher-forcing to train LLMs. This technique entails assigning a model a task, like generating a segment of code, and correcting it with the right answer if it errs. While this approach streamlines the development process, it can potentially restrict the accuracy of the trained LLM.

Meta’s researchers suggest that generating the output of four tokens at once could potentially alleviate the constraints associated with the teacher-forcing approach. The researchers explained, “Teacher-forcing, we argue, encourages models to focus on predicting well in the very short term, at the potential expense of ignoring longer-term dependencies in the overall structure of the generated sequence.”

Meta evaluated the accuracy of its multi-token prediction models using the MBPP and HumanEval benchmark tests. MBPP includes approximately 1,000 Python coding tasks, while HumanEval offers a more intricate set of coding challenges across various programming languages.

Meta reports that its models demonstrated a 17% improvement in MBPP and a 12% improvement in HumanEval compared to similar LLMs that generate tokens sequentially. Moreover, the models achieved a threefold boost in output speed.

il est temps de devenir sérieux avec le genai dan...

harnessing ai: the future of business transformati...

prepare for the future now. achieve greater, secur...

stay ahead with modern technology...

stay ahead...

workforce upskilling for the ai era...

unlock the full potential of generative ai at work...

ai pcs are quickly becoming the key to achieving s...

developing tomorrow’s ai on today’s ai-ready w...

unveiling ai-level productivity...

the new cyber security opportunity in an ‘ai eve...

how ai is changing managed detection and response...

answering your 4 biggest questions about generativ...

understanding the costs of generative ai...

the top 5 generative ai questions on every executi...

7 leading generative ai use cases...

6 steps to success with generative...

revolutionize your product launches with ai-driven...

unlock the full potential of ai-powered software d...

new era energy efficiency whitepaper longform...

compliance automation: a strategic investment for ...

leading the way: how modern workplaces embrace cha...

choosing the right ai foundation model for your ne...

ai governance: the path to responsible ai...

ai in market research: new possibilities, new insi...

ai ready workforce: upskilling for the ai era...

ai pricing strategy: the key to sustainable busine...

ai in business strategy: enhancing decisions boos...

genai at work: revolutionizing modern business ope...

ai misinformation: ai’s role in amplifying misin...

decision intelligence empowering business actions ...

committee machine in ml harnessing ensemble techni...

information processing language serves scalable an...

ai agents in business: transforming operations dr...

ai adoption framework: key components for effectiv...

machine learning use cases that deliver tangible r...

profitable ai-powered data management solutions to...

business-centric cognitive architecture revolution...

ai use cases – innovations for business success...

the role of ai in software development...

alibaba cloud unveils qwen2.5-omni-7b...

openai upgrades chatgpt’s image generation tool ...

microsoft is improving security copilot service wi...

deepseek unveils enhanced v3 model under mit licen...

nvidia reportedly acquires gretel to generate arti...

dataminr raises usd 85 m for real-time analytics...

ai code review startup graphite raises usd 52 m to...

zoom upgrades with agentic ai for enhanced video c...

google introduces gemini robotics and gemini robot...

google launches next-gen lightweight gemma ai mode...

ai21 labs introduces maestro for enhancing llm qua...

servicenow to acquire moveworks in a usd 2.9 b...

qualcomm acquires edge impulse, edge ai startup...

google introduces two new ai features to enhance i...

coreweave plans to buy weight biases for seamless...

openai launches nextgenai consortium with 15 insti...

anthropic pbc raises usd 3.5 b at usd 61.5 b value...

openai introduces gpt-4.5 as the most advanced and...

amazon launches alexa , an llm-powered assistant...

perplexity ai is creating a browser for ‘agentic...

role of machine learning in networking...

Meta Releases Four Open-source Language Models

Insights Desk

Related posts

Alibaba Cloud Unveils Qwen2.5-Omni-7B...

OpenAI Upgrades ChatGPT’s Image Generation Tool ...

Microsoft is Improving Security Copilot Service wi...

DeepSeek Unveils Enhanced V3 Model Under MIT Licen...

Nvidia Reportedly Acquires Gretel to Generate Arti...

Dataminr Raises USD 85 M for Real-time Analytics...

AI Code Review Startup Graphite Raises USD 52 M to...

Zoom Upgrades with Agentic AI for Enhanced Video C...

Google Introduces Gemini Robotics and Gemini Robot...

Google Launches Next-Gen Lightweight Gemma AI Mode...

Our Brands