Highlights:
- Mistral Small 3 contains 24 billion parameters, a considerably smaller number compared to the most advanced LLMs available.
- While developing Mistral Small 3, the company created the base model and skipped refinement, enabling users to fine-tune it to meet their project needs.
Mistral AI and Ai2, the Allen Institute for AI have recently launched new large language models, asserting they rank among the most advanced in their categories.
Mistral’s latest model, Mistral Small 3, and the Allen Institute for AI’s (Ai2) new LLM, Tülu 3 405B, are both released under an open-source license.
Mistral Small 3 features 24 billion parameters, far fewer than the most advanced LLMs available. This compact size allows it to run on select MacBooks with quantization enabled—a technique that reduces hardware demands by slightly compromising output quality.
In an internal evaluation, Mistral compared Mistral Small 3 to Meta’s Llama 3.3 70B Instruct, an open-source LLM with over three times the parameters. Despite its smaller size, Mistral Small 3 produced comparable output quality with significantly faster response times. In another test, it outperformed OpenAI’s GPT-4o mini in both output quality and latency.
LLMs are typically built by first creating a base model and then improving its output quality through various training methods. When developing Mistral Small 3, the company created the base model but omitted the refinement phase, allowing users to fine-tune it themselves to suit their project needs.
Mistral envisions developers using its LLM for various applications. The company highlights its effectiveness in powering AI automation tools that perform tasks in external applications with minimal latency. Additionally, several customers are leveraging Mistral Small 3 for industry-specific use cases in robotics, financial services, and manufacturing.
Mistral researchers stated in a blog post, “Mistral Small 3 is a pre-trained and instructed model catered to the ‘80%’ of generative AI tasks — those that require robust language and instruction following performance, with very low latency.”
The launch of Mistral Small 3 recently coincided with a new LLM release from Ai2, a nonprofit AI institute. Tülu 3 405B is a customized version of Meta’s open-source Llama 3.1 405B model, introduced last June. According to Ai2’s testing, Tülu 3 405B outperformed the original Llama model across multiple benchmarks.
The research team developed the LLM using a process first outlined in November, integrating various training methods, including one uniquely created by Ai2.
The workflow begins with supervised fine-tuning, a training method where an LLM is provided with sample prompts and corresponding answers to help it learn how to respond to user queries. Following this, Ai2 applied a technique called DPO to align Tülu 3 405B’s output with specific user preferences.
Ai2 further enhanced the model’s performance using RLVR, an internally developed training method based on reinforcement learning, a common AI training technique. According to Ai2, RLVR improves AI models’ ability to perform tasks like solving math problems.
Tülu 3 405B represents “the first application of fully open post-training recipes to the largest open-weight models. With this release, we demonstrate the scalability and effectiveness of our post-training recipe applied at 405B parameter scale,” Ai2 researchers wrote in a blog post.