Highlights:

  • Patronus AI’s software employs AI to automatically create adversarial prompts, which test an LLM’s reliability by attempting to elicit unwanted output.
  • One of the focuses of Patronus AI’s platform is enhancing the reliability of LLMs equipped with RAG, or retrieval-augmented generation, capabilities.

Recently, a startup specializing in assisting companies with identifying and resolving reliability issues in their large language models, Patronus AI Inc., has secured a USD 17 million investment.

Notable Capital spearheaded the Series A funding round, with participation from publicly traded observability provider Datadog Inc., Lightspeed Venture Partners, Factorial Capital, and several angel investors from the tech industry. This injection of funds raises Patronus AI’s total external financing to USD 20 million.

Inaccurate information in prompt responses is just one of the risks companies need to address before deploying a large language model (LLM) to production. User prompts can sometimes lead the model to generate copyrighted material or reveal sensitive business data. More subdued problems also need to be addressed, like when an LLM output doesn’t follow a company’s text style guidelines.

San Francisco-based Patronus AI has created a platform designed to assist developers in tackling these challenges. The company claims that its software utilizes AI to automatically create adversarial prompts, which are prompts that test an LLM’s reliability by attempting to deceive it into producing undesired outputs.

The platform also offers prepackaged reliability evaluations developed by the company. Additionally, Patronus AI has integrated a dashboard that visualizes the results of these reliability tests using charts. For instance, if an evaluation includes 100 prompts intended to test the accuracy of an LLM’s responses, the dashboard can show how many of those prompts were handled incorrectly.

“Model hallucinations and safety risks are here to stay. What enterprises need is transparency into model performance and accuracy in order to circumvent risks. For the first time, we’re giving companies a way to truly understand what they are working with so they can deploy LLMs with confidence,” said Chief Executive Anand Kannappan.

One of the use cases Patronus AI aims to address with its platform is enhancing the reliability of LLMs equipped with RAG (retrieval-augmented generation) features. Standard language models generate responses solely based on information from their training datasets. In contrast, an RAG-enabled LLM can enhance its knowledge by accessing external data sources, thereby improving response quality.

The process of incorporating data from external sources into an LLM’s prompts involves multiple steps. According to Patronus AI, developers can use its platform to ensure these steps are executed correctly. The company claims its software delivers at least 20% better “evaluation performance” compared to competing methods.

Developers can also utilize Patronus AI to identify the most suitable LLM for a specific software project. By using the platform, an application team can test multiple models with the same set of prompts to determine which one produces the most accurate responses. The company states that its platform supports both off-the-shelf and customized LLMs.

Sometimes, a language model that performs well initially in production may become less accurate over time. This issue arises when the types of prompts users input into the LLM evolve. Patronus’ platform addresses this by providing an application programming interface that enables developers to continuously monitor and evaluate a deployed LLM for gradual declines in accuracy.

The company intends to utilize the funds from its recently disclosed financing round to bolster product development efforts. This includes expanding its AI research and engineering teams and enhancing its go-to-market efforts. The hiring drive is expected to double the company’s headcount to approximately 24 employees by the end of the year.