Highlights:

  • Pixtral 12B should be able to respond to inquiries about photos, provide captions, count items, and more with the addition of image reasoning skills.
  • Developers will be able to adjust and train the model for their own needs now that it is accessible for download.

Paris-based AI company, Mistral AI launched Pixtral 12B, a multimodal cutting-edge AI technology that can process both images and text.

The new model is the first of its kind to be able to see images alongside text due to vision encoding. It uses over 12 billion parameters.

The 400 million-parameter vision adaptor added to Mistral’s Nemo 12B AI model—which the company had previously released and was capable of comprehending text—forms the basis of the new model. Users can add photos to their entered text by using URLs or base64 encoding due to the adaptor.

Numerous other AI large language models, such as Google LLC’s Gemini, Anthropic PBC’s Claude family, and OpenAI’s GPT-4o, have also included multimodal capabilities that let users input images. Pixtral 12B should be able to respond to inquiries about photos, provide captions, count items, and more with the addition of image reasoning skills.

The company made the code and parameters available via a torrent link on the AI distribution portal Hugging Face and GithHub. Developers are encouraged to begin downloading and utilizing it by the organization.

Developers will be able to adjust and train the model for their own needs now that it is accessible for download. Some of the company’s models are available under the Apache 2.0 license, restriction-free. For other users, Mistral offers a free developing license, but commercial applications demand a paid license, that is not used for research tasks. The company is yet to disclose the license category that Pixtral 12B will fall under.

The model will be available for testing on Mistral’s chatbot – Le Chat and API platform – Le Platforme, according to the post on X made by Sophia Yang, Head of Mistral developer relations.