Highlights:

  • Databricks stated that the acquisition would deliver MosaicML’s technology on its data and AI infrastructure via the Lakehouse Platform, which would provide customers with new access to AI training, fine-tuning, and deployment.
  • MosaicML offers its own generative AI large language models that businesses can train and fine-tune on their own data at a low cost and with high quality.

Databricks Inc., a provider of big data and machine learning tools, announced a USD 1.3 billion acquisition of generative artificial intelligence company MosaicML Inc. in order to make open-source AI models more available to corporate enterprises.

Enterprise businesses are beginning to incorporate generative AI into their workflows at a time when generative AI is capturing the focus of all industries. Applications utilizing generative AI can generate original text, images, and computer code based on natural language input from users. Since the release of OpenAI LP’s ChatGPT chatbot, which can answer queries in natural language, these capabilities have placed the technology on the map.

MosaicML offers its own generative AI large language models that businesses can train and fine-tune their data at a low cost and with high quality. More than 3.3 billion copies of the MPT-7B model have been downloaded from the company’s own MPT family of LLMs. It has just released the MPT-30B model, which has 30 billion fewer parameters than OpenAI’s GPT-3 model, which has more than 175 billion.

Databricks stated that the acquisition would deliver MosaicML’s technology on its data and AI infrastructure via the Lakehouse Platform, providing customers with new access to AI training, fine-tuning, and deployment. Notably, it would allow customers to serve and customize generative AI models while retaining control, security, and possession of their own data without spending much money.

Ali Ghodsi, Co-founder and Chief Executive of Databricks, said, “Every organization should be able to benefit from the AI revolution with more control over how their data is used. Databricks and MosaicML have an incredible opportunity to democratize AI and make the Lakehouse the best place to build generative AI and LLMs.”

MosaicML’s deployment service for training, fine-tuning, and inference is already utilized by several clients, including the non-profit research institute Allen Institute for AI, Generally Intelligent, Scatter Labs, and the browser-based coding tool Replit. Hippocratic AI, a company developing a chatbot for the healthcare industry, uses MosaicML to train its generative AI model.

Naveen Rao, Co-founder, and Chief Executive of MosaicML said, “We started MosaicML to solve the hard engineering and research problems necessary to make large scale training more accessible to everyone. With the recent generative AI wave, this mission has taken center stage. Together with Databricks, we will tip the scales in the favor of many.”

With the ability to host their models in Databricks Lakehouse, businesses can specialize and customize their models on company-specific data and deploy it securely. This is significant because foundation models can only answer generic queries and perform general tasks, whereas law firms, the healthcare industry, and other specialized businesses may perform specialized tasks.

In addition, many companies are hesitant to share their proprietary data with AI providers due to privacy and security concerns. Using OpenAI-like served models could expose the company to the possibility of data breaches or other threats, which could expose it to additional risks. This is particularly important for highly regulated industries in which data control and storage must remain within the confines of secure databases, where the model and data must be kept together.

John Furrier, the Chief Executive Officer and Chief Analyst of a prominent media company stated that in the future, AI would be more valuable to businesses than software – data is the intellectual property.

He stated, “Enterprises want their own models to build, tune and run. Enterprises don’t want to share it or have it uploaded into a public LLM. This is the trend that Databricks is getting in front of and it’s all about the new data developer.”

According to Justin DeBrabant, Senior Vice President of Product at enterprise customer data platform provider ActionIQ Inc., Databricks can now extend its platform for creating, training, and hosting conventional machine learning models to large language models.

Justin DeBrabant said, “That means Databricks offers products and services on the Lakehouse that extend from ETL [data management’ tools to SQL analytics to custom machine learning and now to hosted LLMs. That’s pretty compelling.”

Databricks competes in the same market as Snowflake Inc., a cloud-based data warehouse provider that recently purchased Neeva Inc., a company creating an AI-based search engine based on a large language model. This week, both firms are conducting their annual user conferences.