Highlights:

  • Among the three new models announced, the most significant is Gemma 2 2B, a lightweight LLM crafted for text generation and analysis.
  • Google noted that Gemma 2 2B not only excels in size efficiency but also achieved a score of 56.1 on the Massive Multitask Language Understanding benchmark.

Google LLC is enhancing its Gemma 2 large language model family by adding three new models to improve its open-source AI initiatives. These latest additions are described as being “smaller, safer, and more transparent” compared to many other models in the field.

The company launched its initial Gemma models in February. These models differ from the flagship Gemini models, which are used in the company’s own products and services and are generally seen as more advanced. The main difference is that the Gemma models are much smaller and entirely open-source, so they are free to use. In contrast, the Gemini models are larger, closed-source, and require developers to pay for access.

The Gemma models are built on the same research as the Gemini LLMs. They reflect Google’s initiative to build goodwill within the AI community, akin to Meta Platforms Inc.’s approach with its Llama models.

Among the three new models announced recently, the most significant is Gemma 2 2B, a lightweight LLM tailored for generating and analyzing text. Google states that it is designed to run on local devices like laptops and smartphones and is licensed for use in both research and commercial applications.

Despite having only 2.6 billion parameters, Google claims that Gemma 2 2B delivers performance comparable to, and at times even surpassing, much larger models such as OpenAI’s GPT-3.5 and Mistral AI’s Mistral 8x7B.

To support its claims, Google released results from independent testing conducted by the AI research organization LMSYS. LMSYS reported that Gemma 2 2B scored 1,126 in its chatbot evaluation arena, outperforming Mixtral-8x7B, which scored 1,114, and GPT-3.5-Turbo-0314, which scored 1,106. These results are particularly impressive considering that the latter models have nearly ten times more parameters than the latest Gemma edition.

Google highlighted that Gemma 2 2 B’s strengths extend beyond its size efficiency. It scored 56.1 on the Massive Multitask Language Understanding benchmark and 36.6 on the Mostly Basic Python Programming test, showing improved performance compared to earlier Gemma models.

These results challenge the notion in AI that a larger number of parameters necessarily leads to better performance. Instead, Gemma 2 2B demonstrates that by utilizing advanced training techniques, superior architectures, and higher-quality training data, it’s possible to achieve high performance even with a smaller parameter count.

Google suggested that its work could prompt AI companies to shift away from prioritizing the creation of ever-larger models. Instead, this approach might encourage AI developers to focus more on refining existing models to enhance their performance.

Furthermore, Google highlighted that Gemma 2 2B underscores the significance of employing model compression and distillation methods. The company elaborated that Gemma 2 2B was created by distilling knowledge from significantly larger models. They anticipate that advancements in this field will facilitate the creation of more accessible AI systems with lower computational power demands.

Google has also recently introduced several specialized models, including ShieldGemma. This model comprises a set of safety classifiers intended to identify and filter out toxic content such as hate speech, harassment, and sexually explicit material. ShieldGemma builds upon the original Gemma 2 model and is designed to help developers filter harmful prompts that may lead models to generate undesirable responses. Additionally, it can be used to filter the responses produced by large language models.

Lastly, Gemma Scope aims to enhance transparency for the Gemma 2 models. It achieves this by focusing on specific components of the Gemma 2 models, assisting developers in understanding their internal mechanisms.

“Gemma Scope is made up of specialized neural networks that help us unpack the dense, complex information processed by Gemma 2, expanding it into a form that’s easier to analyze and understand. By studying these expanded views, researchers can gain valuable insights into how Gemma 2 identifies patterns, processes information and ultimately makes predictions,” Google wrote in a blog post.

Gemma 2 2B, ShieldGemma, and Gemma Scope can now be downloaded from various sources, including Hugging Face.