Highlights:

  • Baidu, the Chinese internet search giant, launched Ernie-X1—its first reasoning-focused multimodal foundational model—aimed at competing with DeepSeek.
  • Though it has a relatively small size of just seven billion parameters, Alibaba Cloud claims the model delivers strong performance and advanced multimodal capabilities.

Alibaba Cloud unveiled Qwen2.5-Omni-7B, a new AI model in its Qwen series, designed to understand text, audio, and video content while also enabling real-time voice conversations.

The company stated that the new model is compact enough to run on mobile phones and similar devices.

Alibaba Cloud stated that despite its compact size of just seven billion parameters, the model delivers high performance and advanced multimodal capabilities. It can process video inputs from cameras and monitor on-screen activity as users interact with their devices, enabling real-time responses. This allows it to integrate seamlessly with applications for interactive conversations.

The company said in the announcement, “This unique combination makes it the perfect foundation for developing agile, cost-effective AI agents that deliver tangible value, especially intelligent voice applications.”

Users can leverage the model for real-time support while shopping, step-by-step cooking guidance by identifying ingredients in videos, or even to read and summarize PDFs to simplify research. Its video analysis capabilities also make it especially helpful for visually impaired users, enabling them to navigate their surroundings by reading signs, recognizing context clues, and matching voices to faces.

The company has made the model open source, releasing it on platforms like Hugging Face and GitHub. It’s also available via Qwen Chat and through the company’s open-source community, ModelScope. In open-source development, the code and model weights are freely shared, allowing developers to use, modify, and distribute them. This collaborative approach encourages community-driven innovation, and so far, Alibaba Cloud has open-sourced more than 200 generative AI models.

Following the open-source release of DeepSeek-R1 by the China-based AI developer DeepSeek, Chinese companies have been gaining momentum in the AI market with notable model launches. DeepSeek’s R1 family introduced advanced reasoning capabilities, enabling the models to “think” through problems. More recently, Chinese tech giant Tencent Holdings Ltd. unveiled Hunyuan Turbo S, which the company claims surpasses the performance of R1.

Last week, Chinese internet giant Baidu introduced Ernie-X1, its first reasoning-focused multimodal foundational model, positioned to compete DeepSeek.

In late January, Alibaba also rolled out an update to its largest AI model, Qwen 2.5-Max, claiming it outperformed DeepSeek-V3, the latest non-reasoning model from DeepSeek.