Highlights:
- Okera’s platform offers an AI-powered method that can find, categorize, and tag sensitive data, such as personally identifying information.
- According to the business, Okera’s capabilities would be incorporated into its Unity Catalogue, Databricks’ governance framework for data and AI workloads.
Databricks Inc., a big data analytics company, announced that it has acquired Okera Inc., an artificial intelligence-focused data governance platform, to enhance its capabilities in terms of governance and compliance for machine learning and large language model AIs.
The terms of the agreement were not disclosed by the two businesses; however, it has been reported that Okera has received just under USD 30 million, with its most recent Series B round of USD 10 million in June 2020 being led by Clear Sky.
Since the recent emergence of generative AI models like OpenAI LP’s ChatGPT, there has been a surge in interest from enterprise customers who want to integrate it into their networks. Since LLMs must memorize sizable datasets and can spit them out again, which means they can easily consume and leak sensitive data, there has also been an increase in concerns about the security and privacy of the training data used by LLMs.
Earlier, consumers managed access to data using straightforward data controls that were limited to a single plane, such as a database or a language like SQL. It was possible to construct policies that dealt with SQL queries effectively as long as the data came from SQL.
Chief Executive Ali Ghodsi and the Databricks team announced, “The rise of AI, in particular machine learning models and LLMs, is making this approach insufficient.” They highlighted that the LLM’s emergence has caused multiple data points that businesses require to govern to enhance gradually because “data sources used in AI are machine-generated instead of human-generated.” Besides, the existing policy generation cannot tackle the fast-paced AI development.
They also stated. “AI-specific governance concerns such as provenance and bias fall outside the reach of traditional data governance platforms.”
In order to help solve these issues, Okera’s platform offers an AI-powered method that can find, categorize, and tag sensitive data, such as personally identifying information. Then, using a no-code interface, developers or managers can use these tags to produce access policies, improving data control and transparency. Enterprise clients can then monitor data usage and have a better understanding of what’s going on inside their own systems.
Additionally, Okera offers technology that enables businesses to isolate workloads without compromising performance. This would boost security and privacy by enabling many LLMs to run concurrently without blending data sets or unintentionally sharing or leaking potentially sensitive information between AI models.
Dolly 2.0, a new version of Databricks’ own specialized open-source LLM with features resembling ChatGPT, was just launched. It is much lighter than many others on the market due to its reduced size and portability, but most importantly, its training data does not forbid commercial use.
According to the business, Okera’s capabilities would be incorporated into its Unity Catalogue, Databricks’ governance framework for data and AI workloads. Using Okera’s AI-driven system, enterprise customers will be able to classify and govern all of their data, analytics, and AI assets, including ML models and other features. They will receive the tags they need to create attribute- and intent-based data usage control policies.
Databricks claimed that the developments would offer enterprise customers “a holistic view of their data estate across clouds and can use a single permission model to define access policies” and enable to assure persistent governance.
As a part of the acquisition, the Okera team will join forces with Databricks. This includes the co-founder and CEO of Okera, Nong Li, who is credited with creating Apache Parquet, an open-source column-oriented data format for effective data storage and retrieval on which Databricks and numerous other software businesses are based.
“We founded Okera to help modern, data-driven enterprises accelerate legitimate data access while minimizing data security risks and delivering regulatory compliance. Many organizations don’t have enough technical talent to manage access policies at scale, especially with the explosion of LLMs. What they need is a modern, AI-centric governance solution,” Li stated.