Highlights:

  • The startup recently made headlines with the release of DeepSeek’s frontier R1 LLM, which the company claimed was trained at a significantly lower cost compared to similar-sized models.
  • To create Janus Pro, DeepSeek stated that it developed a “novel autoregressive framework that integrates multimodal understanding and generation.”

DeepSeek, the Chinese AI startup known for creating the highly popular DeepSeek AI chatbot and offering an alternative large language model to OpenAI’s models like ChatGPT. DeepSeek has launched Janus Pro 7B, an advanced AI image generation model.

The startup recently made headlines with the release of DeepSeek’s frontier R1 LLM, which the company claimed was trained at a significantly lower cost compared to similar-sized models. Following this announcement recently, stocks of Nvidia Corp., the leading provider of advanced AI chips, along with other tech companies in the AI sector, took a sharp decline. The R1 model’s impressive capabilities, lower training costs, and reduced deployment expenses suggest that the company could hold a significant competitive advantage.

DeepSeek has now unveiled Janus Pro, an image generation model designed for versatility and efficiency. This advanced version builds upon the Janus model, released last year, enhancing text-to-image creation by scaling up the training data and increasing the model’s size.

The company asserts that the new Janus-Pro-7B model surpasses current AI models like OpenAI’s DALL-E 3 and Stability AI’s Stable Diffusion in image generation leaderboard rankings.

Like other image generation models, users can provide text descriptions of a photo or artwork, and Janus Pro will generate an image based on the input. The company also mentioned that Janus Pro offers both image generation and image analysis (computer vision) capabilities. This means users can upload an image for the model to caption or ask questions about what it “sees.”

To create Janus Pro, DeepSeek explained that it developed a “novel autoregressive framework that combines multimodal understanding and generation.” The company noted that the model addresses an inefficiency caused by separating visual encoding into a distinct pathway, instead utilizing a single, unified transformer for processing. According to DeepSeek, this approach enhances the framework’s flexibility.

“Janus surpasses previous unified model and matches or exceeds the performance of task-specific models. The simplicity, high flexibility, and effectiveness of Janus make it a strong candidate for next-generation unified multimodal models,” the company said, introducing the Janus Pro model on its HuggingFace repository.

Image generation models like Janus are especially appealing to businesses and marketing firms because they can produce realistic and intricate images, including faces, objects, and logos, at scale. These models help save time and reduce costs in creative production, particularly for creating custom imagery used in advertising, blogs, social media, and product images.

Similar to its text generation model DeepSeek-R1, the company has made Janus-Pro-7B free and open-source, available under the MIT license. A demo of the AI model can be accessed on HuggingFace.