Highlights:
- Meta’s researchers suggest that generating output in batches of four tokens at a time may help overcome the limitations associated with the teacher-forcing approach.
- Meta has developed four new models specifically for tasks related to code generation, with each model featuring 7 billion parameters.
Meta Platforms Inc. has released four open-source language models that utilize a cutting-edge machine-learning technique called multi-token prediction.
A leading media house has recently announced the launch of these models. Meta has shared their code on HuggingFace, a widely used platform for hosting AI projects.
Large language models produce one token at a time, either as text or code. A token represents a condensed form of characters within data. The new open-source models from Meta, on the other hand, produce four tokens at once. The company believes that using a processing technique called multi-token prediction can make LLMs more accurate and faster.
Meta has introduced four new models designed for code generation tasks, each boasting 7 billion parameters. Two of these models were trained on 200 billion tokens of code samples, while the other two were trained on 1 trillion tokens each. Additionally, Meta revealed in a paper accompanying the models that they have developed a fifth large language model, yet to be released, which features 13 billion parameters.
Internally, each model consists of two primary components. The first is a shared trunk responsible for the initial computations required to generate a code snippet. Meta explains that a series of output heads manage the subsequent steps of the code generation process. These four output heads individually produce one token at a time, allowing Meta’s models to generate four tokens simultaneously.
The reason why this approach yields higher-quality code compared to traditional LLM designs remains uncertain. Meta’s researchers suggest in their paper that this could be attributed to the specific architecture and construction of language models.
Developers often employ a method called teacher-forcing to train LLMs. This technique entails assigning a model a task, like generating a segment of code, and correcting it with the right answer if it errs. While this approach streamlines the development process, it can potentially restrict the accuracy of the trained LLM.
Meta’s researchers suggest that generating the output of four tokens at once could potentially alleviate the constraints associated with the teacher-forcing approach. The researchers explained, “Teacher-forcing, we argue, encourages models to focus on predicting well in the very short term, at the potential expense of ignoring longer-term dependencies in the overall structure of the generated sequence.”
Meta evaluated the accuracy of its multi-token prediction models using the MBPP and HumanEval benchmark tests. MBPP includes approximately 1,000 Python coding tasks, while HumanEval offers a more intricate set of coding challenges across various programming languages.
Meta reports that its models demonstrated a 17% improvement in MBPP and a 12% improvement in HumanEval compared to similar LLMs that generate tokens sequentially. Moreover, the models achieved a threefold boost in output speed.