An AI took control of my computer ...
...and it didn't exactly go to plan!
Tl;dr: This feature is in beta, so it has a lot of problems, but the potential is enormous. It's easy to set up and serves as a good, working, basic proof-of-concept, and the concept is incredible, but don't expect it to change your life this week.
Background
Anthropic recently released a beta feature called Computer Use, which essentially lets an AI take control of your computer to perform tasks you tell it to do by itself. It's like all the effort they saved on coming up with a creative name went straight into making the feature itself great. The setup is relatively easy; I was able to set it up in under five minutes!
Setup Steps
Anthropic recommends using a virtual machine (VM) to test this feature since it's still in beta i.e. not fully polished and may have some bugs. Since it can literally control your computer (e.g., send a random email or even delete your operating system), a VM is the safest option. Letting the AI use a VM is like letting a kid play with a dollhouse: they might burn it to the ground, but your real house stays intact.
Once I set up the VM and ran the feature, it booted into a 2000s-style Ubuntu (?) desktop with basic apps like a browser and spreadsheet. On the left, there was a simple chatbot-style interface where I could type in tasks in plain text, and the AI would then control the screen to complete them.
P.S. if you want to see the detailed tutorial for setting this up from scratch, let me know in the comments below.
The Tests
I asked the AI to do two things: first, find and download an image of Albert Einstein; second, find Nvidia's stock price from Yahoo Finance for the last three months and download the data.
How Does It Work?
The feature works by taking screenshots of the current screen, identifying buttons, text, etc., and using their coordinates to navigate and click. For example, if it wants to close a tab, it takes a screenshot, locates the "close tab" button, finds its coordinates, navigates there, and clicks.
At each step, whether navigating or clicking, it takes a picture. You can see its thinking process in the chat on the left:
Tool Use: computer
Input: {'action': 'mouse_move', 'coordinate': [511, 101]}
Tool Use: computer
Input: {'action': 'left_click'}
Tool Use: computer
Input: {'action': 'key', 'text': 'Return'}
How Did It Go?
"Promising proof-of-concept" is a fair summary. It's promising because this feels like the first step in a seismic shift in how we use computers. This must be how people must've felt when the Apple II came out. It's building on familiar technology but changing how we interact with it entirely.
In terms of current status, though, it didn't complete either task successfully. For the image download, it got stuck in a loop trying to close the initial popup you see when opening Google. Eventually, I hit a rate limit, which meant I couldn't use the feature for the next 30 minutes.
For the stock price task, it managed to find the Yahoo Finance page but struggled to get to the historical data section and chose the wrong date range. It couldn't find the download button and kept trying various actions (except scrolling for some reason) until I hit another rate limit. I tried both tasks a couple times each and gave up in the end.
The Problems
Where Do We Go From Here?
The potential here is enormous. Imagine a future where you could ask your computer to prepare a report, automate repetitive tasks, or even navigate complex software just by describing what you need. We're likely looking at a whole new way to interact with computers and information. While computer automation tools have existed before, they were rule-based and struggled to adapt to varied scenarios. LLMs, with their nuanced understanding, could be a game changer (I know, overused phrase, but it fits here). The improvements needed are obvious: faster image processing, better navigation, more safeguards, and cost reduction to make this viable.
Founder & Partner at 100X.VC | Early stage startup investor using iSAFE notes
3 个月Interesting thoughts! Keep writing Prashant Lonikar
Student at Bits pilani kk birla goa campus
3 个月Good wishes Dear Bhai :)... Keep learning and growing more and more Dear Bhai :)...
Practicing Chartered Accountant
4 个月Great article and explained AI features very well