OpenAI Introduces Operator & Agents

OpenAI Introduces Operator & Agents

OpenAI Introduces Operator & Agents!

Here is everything you need to know:

Operator is a system that can use a web browser to accomplish tasks. Operator can look at a webpage and interact with it by typing, clicking, and scrolling.

It's available as a research preview. Available in the US for Pro users. Available to Plus users later.

Operator can perform a wide variety of repetitive browser tasks such as filling out forms, ordering groceries, and even creating memes.

Here is an example where a user is asking Operator to book a table for two.

Operator instantiates a remote browser. The agent clicks around and interacts with the webpage to complete the task.

If Operator needs a location it can use the custom instructions to guide itself.

For critical actions, Operator asks the user for confirmation.

You can use Operator for shopping. Provide a shopping list as an image.

Operator is based on a model called Computer-using Agent (CUA). Combining GPT-4o's vision capabilities with advanced reasoning through RL, CUA is trained to interact with graphical user interfaces (GUIs)—the buttons, menus, and text fields people see on a screen.

CUA interacts with screenshots, no APIs! It interacts with the browser with actions allowed by a mouse and keyboard. This removes the requirements for custom API integrations. Using inner monologue to decide what actions to take next based on screenshots.

You can also interact with Operator if you want to add additional instructions and then return the control. Operator can't see when you take over -- this interaction is private.

Below is an example for buying tickets or finding information about events.

You can also run tasks in parallel. If you don't specific website, Operator can just do browsing as well instead of going directly to apps/services.

Here are some details on the safety aspect of Operator.

It can refuse harmful tasks, avoid blocked websites, and prevent spam. Confirmation is a key mitigation strategy built into Operator. There is an interesting prompt injection monitor as an extra layer of security.

Here is a good example of when Operator requires the user to take control.

In this case, it's asking for an email address to continue with signing in. That's why that part of the interaction is kept private.

Here is the performance of CUA on the OSWorld and WebArena benchmarks. CUA performs better than previous SoTA but still has a long way to go when compared to human performance.

The OpenAI folks mentioned that the model will made available in the coming weeks. Sam ended the demo by saying that this is the beginning of their next step into agents (level 3 tier).

Here is the full demo:


Junhao Zhang

Co-Founder at Knitwise | VP of Operations and Sales | Chartered Financial Analyst Level III Candidate

1 个月

Thanks for sharing the updates

回复

要查看或添加评论,请登录

Elvis S.的更多文章

  • My Favorite LLM Papers for October

    My Favorite LLM Papers for October

    Here's a list of my favorite LLM papers I read this month: 1/ Zephyr LLM - a 7B parameter model with competitive…

    2 条评论
  • Tracking LLMs with Comet

    Tracking LLMs with Comet

    When building with LLMs, you will spend a lot of time optimizing prompts and diagnosing LLMs. As you put your solutions…

    3 条评论
  • How To Build a Custom Chat LLM on Your Data

    How To Build a Custom Chat LLM on Your Data

    This is one of the fastest ways to build a custom ChatGPT-like system on top of your data. It's called ChatLLM (by…

    2 条评论
  • Data Exploration with Chat Powered by GPT-4

    Data Exploration with Chat Powered by GPT-4

    As an ML Engineer, this is one of the most useful applications of GPT-4 I've seen. Chat Explore is a powerful…

    6 条评论
  • Open Source Solution Replicates ChatGPT Training Process

    Open Source Solution Replicates ChatGPT Training Process

    ChatGPT is the biggest buzz in AI today! ChatGPT demonstrates remarkable capabilities so there is a high interest to…

    7 条评论
  • New Conversational AI Tool Lets You “Chat” With Your Data

    New Conversational AI Tool Lets You “Chat” With Your Data

    As an ML engineer, one area where I spend a lot of time is data engineering. Can we use conversational AI technologies…

    8 条评论
  • Analyzing Worldwide Energy Production with Kibana?Lens

    Analyzing Worldwide Energy Production with Kibana?Lens

    While there are many tools that can be used to perform a quick analysis of large-scale data, data analysis in itself is…

    1 条评论
  • XLNet outperforms BERT on several NLP Tasks

    XLNet outperforms BERT on several NLP Tasks

    Two pretraining objectives that have been successful for pretraining neural networks used in transfer learning NLP are…

    1 条评论

社区洞察

其他会员也浏览了