Highlights:

  • The company offers three starting points: a Python library, containers, and a cloud-hosted API.
  • Unstructured provides a platform to convert unstructured, internal data into large language model-compatible formats.

Unstructured Technologies Inc., a startup that processes large language model data, has raised USD 25 million in new funding to broaden its operations and clientele.

Unstructured, a company founded in 2022 by U.S. Central Intelligence Agency analyst Brian Raymond, provides a platform that enables businesses to convert their unstructured, internal data into formats compatible with large language models. These are similar to AI models that underpin ChatGPT from OpenAI LP and other chatbots that produce human-like content and answers.

The company offers three entry points to its users: containers, an open-source Python library, and an API hosted in the cloud. More than 20 different natural language file types, including raw data, LLM-ready data, and enterprise-grade data connectors, can be processed by the API. Enterprise-grade data connectors from Unstructured are available for Azure Blob, OneDrive from Microsoft Corp., S3 from Amazon Web Services Inc., Cloud Storage from Google LLC, Google Drive from Google LLC, Dropbox Inc., and Elasticsearch Inc.

The open-source community, commercial businesses, and a few U.S. government defense and intelligence agencies worked together to develop the company’s technology. The U.S. Air Force and Space Force have awarded the business Phase I and Phase II Small Business Innovation and Research contracts, with additional assistance from the U.S. Special Operations Command.

FINSMES claims that Unstructured and SOCOM have had a contract in place since the beginning of the business. Following the agreement, Unstructured has worked with SOCOM to launch the first standalone system within the U.S. armed forces, combining an LLM with mission-critical data.

In an interview, Raymond described how the company solves the data fragmentation problem arising from businesses’ daily production of enormous amounts of unstructured data. Raymond said, “The dirty secret in the [natural language processing] community is that data scientists today still must build artisanal, one-off data connectors and pre-processing pipelines completely manually. Unstructured [delivers] a comprehensive solution for connecting, transforming, and staging natural language data for LLMs.”

M12 Ventures LLC, MongoDB Ventures, Mango Capital Inc., and Shield Capital Partners LP participated in the USD 25 million round, which Bain Capital Venture Associates LLC led. Since the business was founded a year ago, the round was the first fundraising activity that was made publicly known.