Highlights:

  • The agent is powered by a newly introduced OpenAI model called CUA, which is partially based on the company’s multimodal GPT-4o large language model.
  • OpenAI explained that users can take control at any point during the process. For sensitive actions, such as entering login credentials, Operator prompts users to switch to manual mode.

OpenAI launched Operator, an AI agent that can voluntarily perform tasks on behalf of the users.

Meanwhile, two major competitors also announced updates to their offerings. Perplexity AI Inc., known for its popular AI search engine, introduced a similar agent for its Android app. Anthropic PBC, which already offers automation features, launched a tool to enhance citation quality in AI-generated responses.

Initially available as a research preview in the premium Pro tier of ChatGPT, OpenAI’s Operator can perform multistep tasks like ordering groceries, booking flights, and filling out forms. Users can simply provide instructions using natural language prompts.

The agent is powered by a newly introduced OpenAI model called CUA, which is partially based on the company’s multimodal GPT-4o large language model. OpenAI states that CUA integrates the LLM with “advanced reasoning through reinforcement learning.”

When users instruct Operator to perform tasks on a website, the agent navigates to the appropriate URL using its built-in browser. It can type, click, and scroll to complete the requested actions, taking regular screenshots to ensure everything is functioning correctly.

OpenAI explained that users can take control at any point during the process. For sensitive actions, such as entering login credentials, Operator prompts users to switch to manual mode. During this time, the agent stops taking screenshots until the task is finished.

Operator includes several data protection features. Users can log it out of all accounts with a single click and opt out of having their data used for AI training. Additionally, a security system is in place to detect and block malicious websites that attempt to trick the agent into revealing sensitive information.

Operator offers customizable features. For instance, users can save a shopping list and have the agent purchase the specified items whenever it visits a particular e-commerce site. Additionally, users can set up customization options that apply universally across all websites the agent interacts with.

Looking ahead, OpenAI intends to extend Operator’s availability beyond ChatGPT Pro to other subscription tiers. The company also plans to make the agent accessible through its application programming interface (API). Future updates will include enhancements designed to improve Operator’s ability to handle more complex tasks.

“Operator is currently in an early research preview, and while it’s already capable of handling a wide range of tasks, it’s still learning, evolving and may make mistakes,” OpenAI researchers reported. “Early user feedback will play a vital role in enhancing its accuracy, reliability, and safety.”

OpenAI competitor Perplexity AI has introduced its own agent, Perplexity Assistant, now available in its Android app. The assistant can automate tasks such as making e-commerce purchases, booking taxis, and more. It also features multimodal processing, enabling it to analyze content from a user’s smartphone camera and screen.

At launch, Perplexity Assistant supports actions in apps like Spotify, YouTube, and Uber, as well as email, messaging, and clock applications. The company plans to expand support to additional services in the future.

Another OpenAI rival, Anthropic, also unveiled a product update. Anthropic offers the enterprise-focused Claude LLM series via an API. A new feature, Citations, allows users to upload documents to a Claude model and have it highlight the exact sentences referenced when generating responses to prompts.