Previewing OpenAI Operator

Last week, OpenAI introduced the much anticipated Operator feature. The concept for how it works is relatively straightforward:

Operator can “see” (through screenshots) and “interact” (using all the actions a mouse and keyboard allow) with a browser, enabling it to take action on the web without requiring custom API integrations.

This isn’t a new idea; Anthropic released something similar for Claude (Computer Use) in October, which could do roughly the same thing but was only available via API, and an open source tool called WebUI has been around for a bit, as well. The Operator UI, however, is definitely a step forward, and OpenAI releasing it in itself makes it a big deal.

It is billed as a “Research Preview” that it is only available in the US (for now) and requires a new “Pro” license of ChatGPT that runs a cool $200/month. I’m guessing that they learned from the Sora launch (which required a $20/month “Plus” license) that subscriptions are an easy way to try this stuff out, and many people will quickly forget they have them. ??

The website shows a video demo leveraging Hipcamp, and many of the launch partners listed are travel-related, including Priceline, Booking.com, Tripadvisor, and OpenTable. Predictably, it didn’t take long for every travel-related publication to kick-off last Friday morning with an exclusive on it, including all the commentary you’d expect on it revolutionizing travel and eventually eliminating anyone who acted as an intermediary. So, I decided to whip out the credit card, upgrade my subscription, and give the next challenger that is touted to cause the imminent demise of travel agents a little whirl …

First Impressions

My first impression is that they did a really good job with the UI, much better than Anthropic or WebUI. It looked very similar to a standard ChatGPT interface, which everyone is getting used to, and the introduction of the browser was done very smartly. You can take it over at any point, and there are plenty of warnings about being careful with what you ask it to do.

The site comes with some canned examples from the launch partners, so I tried a pretty simple Booking.com one:

Book a one-way flight from JFK to LAX on Friday.

While that may sounds like a pretty basic and simple query, being a travel technologist, I know it is actually a bit more complicated:

  • Deciphering “next Friday” means conquering knowledge of calendars and the English language.
  • NYC is an metropolitan airport code representing several airports in the New York City area, like JFK and LGA, although it is not really standardized, and some systems include rail and smaller airports into them. Travel 101 stuff, yes, but context the system has to understand.

I then sat back and watched. Operator launched a browser (looks like it is Chrome-derived) that went to Booking.com and started clicking around and typing:

The author's demo of OpenAI Operator on Booking.com.

To be honest, I was fairly impressed. The average person may think this would be a trivial task, but deciphering any website (not to mention a consumer-facing one geared towards the singular goal of making you buy travel) is not an easy thing to do, even for us fleshy computers called humans. Outside of just understanding the prompt, when the AI is on the webpage, it then must:

  • Understand it and click Flights.
  • Understand that you’re looking for a one-way flight and click on that.
  • Understand the origin & destination dropdowns, click those, and then enter the locations from the prompt.
  • Understand and navigate a date picker (which is actually the hardest thing it had to do, if you watch the video — and it ultimately didn’t get it right).
  • Proceed to get results and then interpret those.

In this example, perhaps part of Booking.com being a launch partner, it just gave me a textual response in the chat prompt that it derived from the Booking.com results page:

I found a direct JetBlue flight from JFK to LAX on Friday, March 21, 2025. It departs at 6:00 AM and arrives at 9:18 AM, with a duration of 6 hours and 18 minutes. The fare is $133.30 for the "Blue Basic" option, which includes a personal item and carry-on bag. Would you like to proceed with this flight, or do you have any specific preferences or requirements?

Again, for something that is a research preview, it was impressive it had gotten this far. But all of that said, it took a grand total of 2 minutes for the Operator to perform all of these steps — far longer than a person would.

How About Something It’s Never Seen?

To throw a bit of complexity into the mix, I decided to see how it respond to a business travel example. I gave it a more targeted prompt to use Melon, the booking platform in our Corporate Traveler USA brand:

Book me a flight through https://water.melon.travel/ for next Thurs morning leaving Boston at 7am and going to LGA.

Another moderately complex prompt that requires a little bit of know-how, from visiting a specific website to understanding abbreviations, relative dates, and mix & match airport codes.

Unlike the Booking.com example, Melon requires a login, so I was prompted to “Take control” of the session to enter my details — an extremely clever way to handle a common roadblock on many webpages. It was easy to click into the session, enter the information, and then hand it back to the Operator:

The author's demo of OpenAI Operator on Corporate Traveler's proprietary booking tool.

It was similarly impressive to the Booking.com example. It easily interpreted the prompt, navigated the screen, and picked the right dates. It was even smart enough to select the right time banding (a Melon feature, to narrow the search results) and to re-enter data it had already typed into the search after clicking ‘one way’ and finding the parameters were reset. It also understood validation confirmation, which is a common UI component to make sure free-form text entry is validated into data the systems can use.

Because I have a profile in Melon, I actually had to stop the Operator before it purchased the ticket. But had I not taken control, it would have simply done it (and I do have to be in New York on Thursday, so I may go back and try it again!).

Conclusions

Overall, it is another major step forward for OpenAI and generative AI in general. The technology is impressive, including the UI, and allows human interaction. The logic and reasoning that goes into interpreting web pages and picking the right buttons is pretty impressive.

That said, I’m still not exactly worried about the demise of our brands tomorrow as a result. Interpreting websites is hard work even when you specifically train tools on them (we may know a thing or two about this) but Operator’s relative level of accuracy on something it had never seen before was impressive.

Thinking out loud, some interesting use cases this could be handy for include:

  • quickly integrating website content into an application without having to use an API (or if they don’t offer one)
  • comparing pricing from other websites live during a transaction flow (because no one ever said “I found it cheaper somewhere else”, right?)
  • test automation for development
  • simulated user testing for feedback (by playing with the prompts)
  • bridging internal workflows & forms with external systems more easily.

If you consider Operator in the broader context of what generative AI is capable of, advancements in “reasoning” models like OpenAI’s o1, DeepSeek, and others, you can start to see where things are going.

I’m excited to see the opportunities that this kind of technology brings. While we’ve been heavily investing in AI technology for a while now through our Center of Excellence approach, the focus has been largely internal. And while that has already produced some great solutions that drive productivity in our business, we’re now starting to shift more towards customer-facing solutions.

Chat is not exactly a new thing for us, having been in the game well before all the recent rage over ChatGPT and generative AI boom. Maybe it is time for Sam :] to show off some of the new things it can do? Stay tuned and find out ??


(As a footnote to demonstrate how fast this is all moving, OpenAI released Operator last week, around the same time as DeepSeek R1, a new Chinese startup who claims to have achieved OpenAI o1 level reasoning at a dramatically lower training cost, thus sending the markets into a tizzy yesterday, with Nvidia losing $600B in valuation. That’s how much this industry has moved in the last 4-5 days … crazy, huh?)


Ole Hammer Mortensen

Partner at AMMconsulting.dk - Consult: Travel & distribution I B2B I B2C I LEAN I TECH I Travel Management | Keynote speaker I Sustainebility | Operation |

1 个月

Hi John did you test with the promt you made with boston-Austin? Tks for the insight

回复

Thx for sharing John! Race is on with learning bots and AI Agents. Not completely baked but will evolve quickly.

回复

Great example and comparison, thanks John. I remember talking to you in Lisbon a few years back on the "be where I am" thought process, why would we go to a booking site when an "agent" can do it for us. I appreciate your writing style and humour as always ??

回复
Liz Fraser

Chief Revenue Officer at Serko

1 个月

Appreciate you sharing this Jomo, along with video examples. Price comparison must be a productivity gain, and will be interesting to see va negotiated rates.

回复

Nice update John….brings back a few memories when we wrote a data trawler/scraper in 2003 to seek and purchase prized All Black v England tickets. Which succeeded one evening, only to be told the tickets are voided as the rugby booking servers had only been switched on for a 2-min ‘test mode’! ?? needless to say, the rugby board apologised and re-issued our tickets ?? to bad the AB’s lost! ??

回复

要查看或添加评论,请登录

John Morhous的更多文章

  • So, what is reasoning in AI?

    So, what is reasoning in AI?

    Unless you’ve been living under a rock, you would have heard about DeepSeek AI this past week. AI is all the rage at…

    9 条评论

社区洞察

其他会员也浏览了