News | Databricks Launches API to Generate Synthetic Data for ML Tasks

Databricks Launches API to Generate Synthetic Data for ML Tasks

Published by: Insights Desk Released: Dec 10, 2024 Source: DemandTalk

Highlights:

Developers must indicate how many questions and answers the API should provide after uploading the sample data.
Early in the upcoming year, Databricks intends to offer a number of API improvements.

Databricks Inc. launched an application programming interface (API) to help users generate synthetic data for machine learning (ML) projects.

The API is accessible through the company’s flagship data lakehouse tool, Mosaic AI Agent Evaluation. The tool aids developers in evaluating artificial intelligence applications’ latency, cost, and output quality. Mosaic AI Agent Evaluation, which makes retrieval-augmented generation easier to deploy, was released in June together with Mosaic AI Agent Framework.

Information created with AI specifically for the development of neural networks is known as synthetic data. Compared to manually building training datasets, our method is significantly faster and more economical. The goal of Databricks’ new API is to create question and answer collections, which are helpful for creating applications that leverage large language models.

There are three steps involved in creating a dataset using the API.

In order for their AI application to function, developers must first upload a frame, or collection of files, including business information pertinent to the task. Frames need to be in a format that Pandas or Apache Spark support. Pandas is a well-known analytics tool for the Python programming language, while Spark is the open-source data processing engine that powers Databricks’ platform.

Developers must indicate how many questions and answers the API should provide after uploading the sample data. They can optionally offer more guidance to alter the result of the API. A software team can define the end users who will interact with the AI application, the task for which the questions will be used, and the style in which the questions should be generated.

The output of an AI model may be of lower quality if the training data is inaccurate. Therefore, before supplying a synthetic dataset to a neural network, businesses frequently have subject matter experts check it for inaccuracies. According to Databricks, the API was created in a way that makes this step of the process easier.

“Importantly, the generated synthetic answer is a set of facts that are required to answer the question rather than a response written by the LLM,” Databricks engineers reported. “This approach has the distinct benefit of making it faster for an SME to review and edit these facts vs. a full, generated response.”

Early in the upcoming year, Databricks intends to offer a number of API improvements. Reviewers of datasets will be able to add more pairs if needed and more quickly evaluate question-answer pairs for problems due to a new graphical interface. Databricks will also include a tool for monitoring the evolution of a business’s synthetic datasets.

harnessing ai: the future of business transformati...

prepare for the future now. achieve greater, secur...

stay ahead with modern technology...

stay ahead...

workforce upskilling for the ai era...

unlock the full potential of generative ai at work...

ai pcs are quickly becoming the key to achieving s...

developing tomorrow’s ai on today’s ai-ready w...

unveiling ai-level productivity...

the new cyber security opportunity in an ‘ai eve...

how ai is changing managed detection and response...

answering your 4 biggest questions about generativ...

understanding the costs of generative ai...

the top 5 generative ai questions on every executi...

7 leading generative ai use cases...

6 steps to success with generative...

revolutionize your product launches with ai-driven...

unlock the full potential of ai-powered software d...

new era energy efficiency whitepaper longform...

new era energy efficiency whitepaper longform...

compliance automation: a strategic investment for ...

leading the way: how modern workplaces embrace cha...

choosing the right ai foundation model for your ne...

ai governance: the path to responsible ai...

ai in market research: new possibilities, new insi...

ai ready workforce: upskilling for the ai era...

ai pricing strategy: the key to sustainable busine...

ai in business strategy: enhancing decisions boos...

genai at work: revolutionizing modern business ope...

ai misinformation: ai’s role in amplifying misin...

decision intelligence empowering business actions ...

committee machine in ml harnessing ensemble techni...

information processing language serves scalable an...

ai agents in business: transforming operations dr...

ai adoption framework: key components for effectiv...

machine learning use cases that deliver tangible r...

profitable ai-powered data management solutions to...

business-centric cognitive architecture revolution...

ai use cases – innovations for business success...

the role of ai in software development...

google introduces gemini robotics and gemini robot...

google launches next-gen lightweight gemma ai mode...

ai21 labs introduces maestro for enhancing llm qua...

servicenow to acquire moveworks in a usd 2.9 b...

qualcomm acquires edge impulse, edge ai startup...

google introduces two new ai features to enhance i...

coreweave plans to buy weight biases for seamless...

openai launches nextgenai consortium with 15 insti...

anthropic pbc raises usd 3.5 b at usd 61.5 b value...

openai introduces gpt-4.5 as the most advanced and...

amazon launches alexa , an llm-powered assistant...

perplexity ai is creating a browser for ‘agentic...

mongodb acquires voyage ai for ai models generatin...

couchbase enhances its agentic ai capabilities wit...

google introduces deep research to dig deeper into...

openai surpasses 400 m weekly users...

luminance technologies secures usd 75 m for legal ...

enso introduces ai marketplace offering monthly su...

hightouch secures usd 80 m at usd 1.2 b valuation...

safe superintelligence reportedly securing usd 1b...

role of machine learning in networking...

Databricks Launches API to Generate Synthetic Data for ML Tasks

Insights Desk

Related posts

Google Introduces Gemini Robotics and Gemini Robot...

Google Launches Next-Gen Lightweight Gemma AI Mode...

AI21 Labs Introduces Maestro for Enhancing LLM Qua...

ServiceNow to Acquire Moveworks in a USD 2.9 B...

Qualcomm Acquires Edge Impulse, Edge AI Startup...

Google Introduces Two New AI Features to Enhance i...

CoreWeave Plans to Buy Weight Biases for Seamless...

OpenAI Launches NextGenAI Consortium with 15 Insti...

Anthropic PBC Raises USD 3.5 B at USD 61.5 B Value...

OpenAI Introduces GPT-4.5 as the Most Advanced and...

Our Brands