Highlights:
- Datasaur has highlighted that its data labeling tool learns and enhances its accuracy and potency through its tasks.
- It empowers companies to convert vast volumes of raw data into valuable datasets primed for training more sophisticated NLP models.
Datasaur Inc., a startup specializing in natural language processing, recently announced the successful closure of a USD 4 million seed funding round, contributing to a cumulative raised total of USD 7.9 million.
Led by Initialized Capital, the funding round featured contributions from HNVR, TenOneTen, Gold House Ventures, and an early investment by OpenAI LP President Greg Brockman.
Datasaur has developed an intuitive and efficient platform for labeling datasets in natural language processing models. NLP, a subset of artificial intelligence, trains computers to comprehend spoken words and text like human understanding.
The startup elucidates that the NLP industry is progressing to a point where companies exhibit growing interest in training models using their proprietary datasets. By doing so, they can train models to perform certain particular duties more effectively. Therefore, businesses require a simple method for labeling and preparing their proprietary data for AI training.
This is where Datasaur’s NLP training platform steps in. It empowers companies to convert vast volumes of raw data into valuable datasets primed for training more sophisticated NLP models. Datasaur has highlighted that its data labeling tool learns and enhances its accuracy and potency through its tasks. Additionally, the platform offers tools to aid companies in identifying errors within their AI training datasets.
Andy Thurai, Vice President and Principal Analyst at Constellation Research Inc., emphasized in an interview that the significance of an effective data labeling tool cannot be overstated. He elaborated that the data employed to train generative AI and large language models, such as ChatGPT, necessitates meticulous annotation for accurate training. Thurai pointed out that inadequately labeled data leads to inaccuracies in the model’s performance.
The analyst clarified that Datasaur’s platform encompasses a remarkable feature: enabling data scientists to incorporate numerous domain experts as annotators. This inclusion aids in mitigating bias and reducing the presence of inaccurate information. He stated, “It also provides an option to pre-label or pre-process data, thereby speeding up the annotation process. However, the company operates in a very competitive field and there are numerous rival platforms, such as Amazon Mechanical Turk, Appen, SuperAnnotate, DataLoop, V7 Darwin, Cogito, Hive, Edgecase and ClickWorker.”
Considering the competitive landscape of data labeling, Datasaur has announced its aspiration to transform into a comprehensive NLP platform. In pursuit of this goal, the company has introduced a fresh product named Dinamic. It can seamlessly convert customer-labeled data into a customized NLP model with a single click.
Utilizing its data labeling tool and Dinamic, Datasaur asserts that it has streamlined a formerly intricate, multi-stage process into a simplified, two-step workflow. It empowers customers to annotate data according to precise business needs and acquire a fully trained NLP model with minimal exertion. As per the company’s assertion, it has the potential to save customers “millions of dollars” in data science expenditures throughout the process.
Ivan Lee, Founder, and Chief Executive, stated that the company’s initial emphasis on data labeling stemmed from its recognition as the most intricate and time-consuming phase in NLP development. He explained, “Today we are in a perfect storm between the dizzying advancements in LLM technology alongside renewed vigor from business stakeholders in translating AI into cost savings and accelerated revenue generation.”
Datasaur’s most prominent achievement to date lies in its data labeling platform, which has been employed by notable companies like Google LLC, Qualtrics International Inc., and Spotify Inc. This platform facilitated the training of audio clips and text-based data extracted from PDF and Word documents.
Brett Gibson, Managing Partner at Initialized Capital, noted that the demand for NLP is so substantial that he anticipates Datasaur’s platform to gain significant popularity. He stated, “We’re seeing companies in every industry and vertical rushing to discover how to apply ChatGPT-like technology to their own processes. Products like Datasaur Dinamic simplify and standardize the process for those new to the NLP space. We saw the potential in the NLP space in 2020 when we first invested in this team, and the time is ripe to capture the rapidly growing market.”