OpenAI Introduces Operator & Agents
OpenAI Introduces Operator & Agents!
Here is everything you need to know:
Operator is a system that can use a web browser to accomplish tasks. Operator can look at a webpage and interact with it by typing, clicking, and scrolling.
It's available as a research preview. Available in the US for Pro users. Available to Plus users later.
Operator can perform a wide variety of repetitive browser tasks such as filling out forms, ordering groceries, and even creating memes.
Here is an example where a user is asking Operator to book a table for two.
Operator instantiates a remote browser. The agent clicks around and interacts with the webpage to complete the task.
If Operator needs a location it can use the custom instructions to guide itself.
For critical actions, Operator asks the user for confirmation.
You can use Operator for shopping. Provide a shopping list as an image.
Operator is based on a model called Computer-using Agent (CUA). Combining GPT-4o's vision capabilities with advanced reasoning through RL, CUA is trained to interact with graphical user interfaces (GUIs)—the buttons, menus, and text fields people see on a screen.
CUA interacts with screenshots, no APIs! It interacts with the browser with actions allowed by a mouse and keyboard. This removes the requirements for custom API integrations. Using inner monologue to decide what actions to take next based on screenshots.
领英推荐
You can also interact with Operator if you want to add additional instructions and then return the control. Operator can't see when you take over -- this interaction is private.
Below is an example for buying tickets or finding information about events.
You can also run tasks in parallel. If you don't specific website, Operator can just do browsing as well instead of going directly to apps/services.
Here are some details on the safety aspect of Operator.
It can refuse harmful tasks, avoid blocked websites, and prevent spam. Confirmation is a key mitigation strategy built into Operator. There is an interesting prompt injection monitor as an extra layer of security.
Here is a good example of when Operator requires the user to take control.
In this case, it's asking for an email address to continue with signing in. That's why that part of the interaction is kept private.
Here is the performance of CUA on the OSWorld and WebArena benchmarks. CUA performs better than previous SoTA but still has a long way to go when compared to human performance.
The OpenAI folks mentioned that the model will made available in the coming weeks. Sam ended the demo by saying that this is the beginning of their next step into agents (level 3 tier).
Here is the full demo:
Co-Founder at Knitwise | VP of Operations and Sales | Chartered Financial Analyst Level III Candidate
1 个月Thanks for sharing the updates