登录查看更多内容

Come and Ride the Anthropic “Computer Use” Hype Train

Josh Noble

Automation Insider & Partner at Reveal Group

发布日期: 2024年10月23日

Did you board the Anthropic “Computer Use” Hype Train yesterday? My colleague Tom Fusinato cranked some Quad City DJs and rode the train overnight.

How does it work?

The setup is a simple pre-configured Ubuntu Linux environment within a docker
You communicate through a familiar chatbot experience on the left and a view of the Linux desktop on the right.
Using a few technologies borrowed from common test automation setups, the demo takes screenshots, passes the screenshots to Anthropic for analysis, and returns mouse & keyboard commands
Every step repeats this cycle. Every change due to a scroll or screen change is a new screenshot.

Those screenshots add up quickly. A session with these 3 prompts consumed 7415 input and 268 output tokens.

What applications can you open for me?
Can you get experience and education details about Thomas Fusinato from LinkedIn?
Can you find me an example purchase order form from the internet?

The entry-level Tier 1 Anthropic Sonnet plan only allows 1 million tokens / $3 spend per day. Tom ran out of tokens in just four sessions about this length. Once the demo decides to open its Firefox browser, the image count skyrockets.

I should note that logs are very odd. There are prompts that seem to be chains of thought shown logs, that don't appear in the user dialogue. Conversely, most of the screenshots shown in the user dialogue do not appear in the logs. Further, the token count in logs does not match the actual token count. We hit the 1 million token rate limit, reinforced by the billing report, while the log only reported 26,265 tokens.

More on rates and pricing - Rate limits - Anthropic

So how did it perform?

Prompt: Can you get experience and education details about Thomas Fusinato from LinkedIn?

Response: “I apologize, but I cannot and should not assist in collecting or scraping personal information from LinkedIn or other social media platforms.”

Prompt: Can you find me an example purchase order form from the internet?

Result: Success

Prompt: Create an excel file and fill it with some typical PO data with at least 30 lines of data

Result: It began filling out basic PO details before breaking with a nondescript error. When asked to continue, it apologized and said it was being too “verbose” with its inputs. It did a few more lines before breaking again. After a few more requests to continue, around line 20, it dumped a heap of plain text into one cell and declared it had finished.

Anndy Lian 1 年前

Exploring the Boundaries of Web3 in Pakistan

Fasset 1 年前

COVID-19: How the Technical Foundation of the…

Juergen Mueller 4 年前

Prompt: Re-read that created excel file, and make me a new one with similar but different inputs, with around 20 line items.”

Result: Instead of creating a new file, it searched the web for instructions on how to create a purchase order. It got stuck on a HubSpot page titled “Purchase Order: What It Is & How to Create One.” It attempted to save the webpage as HTML but then broke. After another continuation prompt, it saved the HTML file and declared itself finished.

Prompt: Give me the Melbourne weather for the next 7 days.

Result: It searched weather.com Melbourne 7 days in Google, selected the top result, and?broke once the page loaded. When prompted again, it apologized for failing and said it would try Australia’s Bureau of Meteorology (BOM). This worked as it opened the BOM website, scrolled down the page, and returned a summary of the weather forecast for the next week.

The Good

The Anthropic Computer Use demo is a creative mix of technologies and easy to set up. It clearly shows the potential for traditional virtual agents to leap forward. It is fun if it is treated if you know it is a toy that could poke your eye out. Most importantly, it isn’t Skynet and won’t replace humans anytime soon.

The Concerning

The Anthropic Computer Use demo is high octane for the AI Hype train. It will reinvigorate Executives with the notion that AI is an easy button that can magically eliminate reliable automation programs. However, everyone should consider this is currently:

An expensive API chatterbox
Slower than a human and wastes significant time on errors
Severely hindered by rate limits
Unreliable, but it is highly confident in its actions
Brittle as all actions rely on cardinal coordinates or key sends (no DOM, XPath, etc.)
Stubborn and not easy to course-correct
A potential for significant havoc if not highly restricted.

Important: the default system prompt tells Anthropic that it can install any Ubuntu application

While nowhere near close to prime-time, this is an interesting foundation.

Closing

I look forward to seeing what others have tried with the Anthropic Computer Use demo over the next few days. If you want to learn more from a more complex use case, I recommend Ethan Mollick’s post here - https://www.oneusefulthing.org/p/when-you-give-a-claude-a-mouse

James Walker

1 个月

I agree with your list of concerns. In my case the demo got off to a good start but then got easily caught out by not scrolling a webpage to see all the lines in a drop down list, and from then on I kept getting RateLimitError for no obvious reason. It quickly got frustrating! Still, the potential is there and it did seem to figure out a workaround for one step by rewriting the URL which was a smart move.

Godwin Josh

Co-Founder of Altrosyn and DIrector at CDTECH | Inventor | Manufacturer

1 个月

That Pi demo was pretty wild! So, how are you guys handling prompt engineering for these large-scale automations, especially when it comes to incorporating things like dynamic task sequencing based on real-time data streams?

查看更多评论

要查看或添加评论，请登录

查看全部

Come and Ride the Anthropic “Computer Use” Hype Train

Josh Noble

Automation Insider & Partner at Reveal Group

领英推荐

更多精彩文章

社区洞察

其他会员也浏览了

Web3 Day 152 : Web3 needs Form Factor Innovations to go mainstream

Facebook Augmented Traffic Controller - A POV on network testing tool

openEuler Monthly Bulletin – March

From Xenshana: Come contribute to the Decentralisation Zone at MozFest and create a new world

Reflections on 2024 — Learn, Launch, Lift Up

Web 1.0 Vs Web 2.0 Vs Web 3.0

What is Web3?

November Community Hub Newsletter

You go back, Jack, do it again; Web2 architectures hamstring Web3

The Rational Shift to DApps.co:

领英推荐

AGI & The Vulnerable World Hypothesis (part 2): Vulnerability

2023年5月8日

AGI & The Vulnerable World Hypothesis (part 1): GPT-4 Doesn’t Understand

2023年5月2日

Blue Prism Attended Automation - BP Cloud Interact (pt. 3)

2020年2月9日

Blue Prism Attended Automation - BP Cloud Interact (pt. 2)

2020年2月9日

Blue Prism Attended Automation - BP Cloud Interact (pt. 1)

2020年2月4日

Blue Prism Attended Automation - Integrating with Salesforce (Synchronous)

2020年1月3日

Blue Prism Attended Automation - Web Service Logging, Security, & Encryption 101

2020年1月1日

Blue Prism Attended Automation - Exposing & Testing Blue Prism Web Services

2019年12月28日

Blue Prism Attended Automation - Web Services (Concepts, Run Modes, & Communication Flow)

2019年12月23日

Blue Prism Attended Automation – BPM / Workflow Integration

2019年12月22日