Come and Ride the Anthropic “Computer Use” Hype Train
Did you board the Anthropic “Computer Use” Hype Train yesterday? My colleague Tom Fusinato cranked some Quad City DJs and rode the train overnight.
How does it work?
Those screenshots add up quickly. A session with these 3 prompts consumed 7415 input and 268 output tokens.
The entry-level Tier 1 Anthropic Sonnet plan only allows 1 million tokens / $3 spend per day. Tom ran out of tokens in just four sessions about this length. Once the demo decides to open its Firefox browser, the image count skyrockets.
I should note that logs are very odd. There are prompts that seem to be chains of thought shown logs, that don't appear in the user dialogue. Conversely, most of the screenshots shown in the user dialogue do not appear in the logs. Further, the token count in logs does not match the actual token count. We hit the 1 million token rate limit, reinforced by the billing report, while the log only reported 26,265 tokens.
More on rates and pricing - Rate limits - Anthropic
So how did it perform?
Prompt: Can you get experience and education details about Thomas Fusinato from LinkedIn?
Response: “I apologize, but I cannot and should not assist in collecting or scraping personal information from LinkedIn or other social media platforms.”
?
Prompt: Can you find me an example purchase order form from the internet?
Result: Success
?
Prompt: Create an excel file and fill it with some typical PO data with at least 30 lines of data
Result: It began filling out basic PO details before breaking with a nondescript error. When asked to continue, it apologized and said it was being too “verbose” with its inputs. It did a few more lines before breaking again. After a few more requests to continue, around line 20, it dumped a heap of plain text into one cell and declared it had finished.
领英推荐
?
Prompt: Re-read that created excel file, and make me a new one with similar but different inputs, with around 20 line items.”
Result: Instead of creating a new file, it searched the web for instructions on how to create a purchase order. It got stuck on a HubSpot page titled “Purchase Order: What It Is & How to Create One.” It attempted to save the webpage as HTML but then broke. After another continuation prompt, it saved the HTML file and declared itself finished.
?
Prompt: Give me the Melbourne weather for the next 7 days.
Result: It searched weather.com Melbourne 7 days in Google, selected the top result, and?broke once the page loaded. When prompted again, it apologized for failing and said it would try Australia’s Bureau of Meteorology (BOM). This worked as it opened the BOM website, scrolled down the page, and returned a summary of the weather forecast for the next week.
?
The Good
The Anthropic Computer Use demo is a creative mix of technologies and easy to set up. It clearly shows the potential for traditional virtual agents to leap forward. It is fun if it is treated if you know it is a toy that could poke your eye out. Most importantly, it isn’t Skynet and won’t replace humans anytime soon.
?
The Concerning
The Anthropic Computer Use demo is high octane for the AI Hype train. It will reinvigorate Executives with the notion that AI is an easy button that can magically eliminate reliable automation programs. However, everyone should consider this is currently:
Important: the default system prompt tells Anthropic that it can install any Ubuntu application
While nowhere near close to prime-time, this is an interesting foundation.
Closing
I look forward to seeing what others have tried with the Anthropic Computer Use demo over the next few days. If you want to learn more from a more complex use case, I recommend Ethan Mollick’s post here - https://www.oneusefulthing.org/p/when-you-give-a-claude-a-mouse
I agree with your list of concerns. In my case the demo got off to a good start but then got easily caught out by not scrolling a webpage to see all the lines in a drop down list, and from then on I kept getting RateLimitError for no obvious reason. It quickly got frustrating! Still, the potential is there and it did seem to figure out a workaround for one step by rewriting the URL which was a smart move.
Co-Founder of Altrosyn and DIrector at CDTECH | Inventor | Manufacturer
1 个月That Pi demo was pretty wild! So, how are you guys handling prompt engineering for these large-scale automations, especially when it comes to incorporating things like dynamic task sequencing based on real-time data streams?