登录查看更多内容

OpenAI Introduces Operator & Agents

Elvis S.

Cofounder & CEO at DAIR.AI | Ph.D. | Prev: Meta AI, Galactica LLM, Elastic | Prompting Guide (6M+ learners) | I teach how to build with AI ??

发布日期: 2025年1月23日

+ 关注

OpenAI Introduces Operator & Agents!

Here is everything you need to know:

Operator is a system that can use a web browser to accomplish tasks. Operator can look at a webpage and interact with it by typing, clicking, and scrolling.

It's available as a research preview. Available in the US for Pro users. Available to Plus users later.

Operator can perform a wide variety of repetitive browser tasks such as filling out forms, ordering groceries, and even creating memes.

Here is an example where a user is asking Operator to book a table for two.

Operator instantiates a remote browser. The agent clicks around and interacts with the webpage to complete the task.

If Operator needs a location it can use the custom instructions to guide itself.

For critical actions, Operator asks the user for confirmation.

You can use Operator for shopping. Provide a shopping list as an image.

Operator is based on a model called Computer-using Agent (CUA). Combining GPT-4o's vision capabilities with advanced reasoning through RL, CUA is trained to interact with graphical user interfaces (GUIs)—the buttons, menus, and text fields people see on a screen.

CUA interacts with screenshots, no APIs! It interacts with the browser with actions allowed by a mouse and keyboard. This removes the requirements for custom API integrations. Using inner monologue to decide what actions to take next based on screenshots.

领英推荐

OpenAI Has Switched From Next.JS to Remix - Here’s Why

Blockchain Council 6 个月前

Step One: Don’t Eat Rocks

Rev 8 个月前

How OpenAI's Powerful New SearchGPT is Shaking Up the…

B2B Technology Zone 7 个月前

You can also interact with Operator if you want to add additional instructions and then return the control. Operator can't see when you take over -- this interaction is private.

Below is an example for buying tickets or finding information about events.

You can also run tasks in parallel. If you don't specific website, Operator can just do browsing as well instead of going directly to apps/services.

Here are some details on the safety aspect of Operator.

It can refuse harmful tasks, avoid blocked websites, and prevent spam. Confirmation is a key mitigation strategy built into Operator. There is an interesting prompt injection monitor as an extra layer of security.

Here is a good example of when Operator requires the user to take control.

In this case, it's asking for an email address to continue with signing in. That's why that part of the interaction is kept private.

Here is the performance of CUA on the OSWorld and WebArena benchmarks. CUA performs better than previous SoTA but still has a long way to go when compared to human performance.

The OpenAI folks mentioned that the model will made available in the coming weeks. Sam ended the demo by saying that this is the beginning of their next step into agents (level 3 tier).

Here is the full demo:

Junhao Zhang

Co-Founder at Knitwise | VP of Operations and Sales | Chartered Financial Analyst Level III Candidate

1 个月

Thanks for sharing the updates

要查看或添加评论，请登录

Elvis S.的更多文章

My Favorite LLM Papers for October

2023年10月30日

My Favorite LLM Papers for October

Here's a list of my favorite LLM papers I read this month: 1/ Zephyr LLM - a 7B parameter model with competitive…

2 条评论
Tracking LLMs with Comet

2023年8月9日

Tracking LLMs with Comet

When building with LLMs, you will spend a lot of time optimizing prompts and diagnosing LLMs. As you put your solutions…

3 条评论
How To Build a Custom Chat LLM on Your Data

2023年7月3日

How To Build a Custom Chat LLM on Your Data

This is one of the fastest ways to build a custom ChatGPT-like system on top of your data. It's called ChatLLM (by…

2 条评论
Data Exploration with Chat Powered by GPT-4

2023年3月30日

Data Exploration with Chat Powered by GPT-4

As an ML Engineer, this is one of the most useful applications of GPT-4 I've seen. Chat Explore is a powerful…

6 条评论
Open Source Solution Replicates ChatGPT Training Process

2023年2月21日

Open Source Solution Replicates ChatGPT Training Process

ChatGPT is the biggest buzz in AI today! ChatGPT demonstrates remarkable capabilities so there is a high interest to…

7 条评论
New Conversational AI Tool Lets You “Chat” With Your Data

2023年2月14日

New Conversational AI Tool Lets You “Chat” With Your Data

As an ML engineer, one area where I spend a lot of time is data engineering. Can we use conversational AI technologies…

8 条评论
Analyzing Worldwide Energy Production with Kibana?Lens

2019年12月23日

Analyzing Worldwide Energy Production with Kibana?Lens

While there are many tools that can be used to perform a quick analysis of large-scale data, data analysis in itself is…

1 条评论
XLNet outperforms BERT on several NLP Tasks

2019年6月30日

XLNet outperforms BERT on several NLP Tasks

Two pretraining objectives that have been successful for pretraining neural networks used in transfer learning NLP are…

1 条评论

See all articles

OpenAI Introduces Operator & Agents

Elvis S.

Cofounder & CEO at DAIR.AI | Ph.D. | Prev: Meta AI, Galactica LLM, Elastic | Prompting Guide (6M+ learners) | I teach how to build with AI ??

领英推荐

Elvis S.的更多文章

社区洞察

其他会员也浏览了

TechNews: SearchGPT from OpenAI arrives, OpenAI's Secret Project, 97% of CrowdStrike Systems back and more

Back from the Brink

Turning browsers into smart agents with GPT + ARIA

Run Scrapy on Apify

The Best ChatGPT Plugins: How To Add Browsing, Learning, Wolfram And More

Cloudflare Changes Data Scraping as We Know it as Websites Soon to Gain Power to Charge AI for Content Access

The search is over: Ask me anything.

Could OpenAI Destroy your Company for using the word 'ChatGPT' or 'GPT'?

Google Unveils Cutting-Edge Crawler Duo

OpenAI Introduces Plugin Support For ChatGPT

领英推荐

Elvis S.的更多文章

My Favorite LLM Papers for October

Tracking LLMs with Comet

How To Build a Custom Chat LLM on Your Data

Data Exploration with Chat Powered by GPT-4

Open Source Solution Replicates ChatGPT Training Process

New Conversational AI Tool Lets You “Chat” With Your Data

Analyzing Worldwide Energy Production with Kibana?Lens

XLNet outperforms BERT on several NLP Tasks

社区洞察

其他会员也浏览了

TechNews: SearchGPT from OpenAI arrives, OpenAI's Secret Project, 97% of CrowdStrike Systems back and more

Back from the Brink

Turning browsers into smart agents with GPT + ARIA

Run Scrapy on Apify

The Best ChatGPT Plugins: How To Add Browsing, Learning, Wolfram And More

Cloudflare Changes Data Scraping as We Know it as Websites Soon to Gain Power to Charge AI for Content Access

The search is over: Ask me anything.

Could OpenAI Destroy your Company for using the word 'ChatGPT' or 'GPT'?

Google Unveils Cutting-Edge Crawler Duo

OpenAI Introduces Plugin Support For ChatGPT