First contact with Claude 3.5 V2 Computer Use

First contact with Claude 3.5 V2 Computer Use

This article represents my views and opinions and do not represent my employer.

Specifically using the Computer use functionality.

tl;dr it's amazing, it can code, debug, rationalize and work around issues.

What’s the focus here?

On 22nd of October the announcement of the the new Claude Sonnet 3.5 V2, read here. Inside this announcement was a public beta for Computer Use; tooling for Claude that gives it access to desktop actions, text editing and bash commands. In this article I dive into: general web usage, adding a mustache to a goat and developing an entire three-tiered web application using React, Vite, Express, SQLLite, Prisma and Typescript. All just by using a handful of vague prompts.

"Claude 3.5 Sonnet now offers computer use capabilities in Amazon Bedrock in public beta, allowing Claude to perceive and interact with computer interfaces."

As of this release, Claude now has:

  • Computer tool – This tool can receive as input a screenshot and a goal and returns a description of the mouse and keyboard actions that should be performed to achieve that goal. For example, this tool can ask to move the cursor to a specific position, click, type, and take screenshots.
  • Text editor tool – Using this tool, the model can ask to perform operations like viewing file contents, creating new files, replacing text, and undoing edits.
  • Bash tool – This tool returns commands that can be run on a computer system to interact at a lower level as a user typing in a terminal.

Claude already was able to review images using its multi-modal LLM capabilities, it uses screenshots and the bash output to review its actions and begin on the next action.

Getting started with the functionality.

Anthropic has released a quick start repository here for computer use. I utilized the bedrock functionality and launched the demo with no issues at all. (after first enabling Claude Sonnet 3.5 v2 in bedrock).

In the quick start you are presenting with a Linux virtual machine and a chat window.

A virtual environment that can directly convert Claude responses to actions.

I got started by asking it to complete a simple task.

Search up Joshua Toth at AWS and tell me about him.

Claude:

  • Opened Firefox
  • Did a google search on me
  • Found my LinkedIn
  • Found my personal website and medium page.
  • Visited all of those pages and created a summary.

An early summary directly from the google results page, before diving deeper (unprompted) into the listed links.

It did this relatively quickly. It also tried to visit LinkedIn but got blocked with a login screen. It then visited several other websites including my personal website and my medium page.

Something More complicated

Next up I asked it to add a mustache to the picture of a goat.

The chatbot then proceeded to:

  1. Visit google and do an image search on "goat"
  2. Then it decided that the images weren't great and searched "goat side profile"
  3. Right clicked and saved an image as "goat.jpg.jpeg" it actually typed goat.jpg but the text only overwrote the content before the .jpeg file type from google.
  4. Tried to launch 'gimp' (an image editor) from bash
  5. Gimp was not installed so it then ran apt-update and installed gimp
  6. Then it attempted to open 'goat.jpg' which is the incorrect name (small mistake here)
  7. It realised there was an issue with gimp and then installed and used imagemagick
  8. It used imagemagick to create a mustache
  9. It used imagemagick to then add the mustache to the goat.jpg
  10. It couldn't find that exact filename so it used curl to re-download the image it had found on google (without reusing the browser)
  11. Then it used imagemagick again to add the mustache
  12. It displayed the image in a browser.

Here is the image:

Technically an image of a goat with a mustache on it. Goat source:

The mustache wasn't right so it then tried 3 times to place the mustache in the correct position, although in the end the mustache wasn't exactly where a human would have put it.

Wow.

From installing missing programs to debugging errors and pivoting approach I was blown away. Even if it did sort of miss the point of adding a mustache to the goat.

Developer actions

I'm a developer, and I wanted to check my "I'm in danger" radar a little. Here is my initial prompt on the next stage:

"I would like to create a basic website using the react VITE framework, have it running in developer mode. I would like to display just a simple landing page for a product called: "Prototyping demos for everyone" where users eventually will be able to share their prototype demos through url links that they submit. I only need the landing page functionality and a starting application I can build on."

Cool, a landing page running on a locally hosted developer environment:

The initial VITE webapp, with the title I gave it and extra content that was appropriate

I then added more complicated requirements

"I would like to add an express server using typescript that acts as the API for the functionality I would like to allow for users to upload an image, a prototype description, a url and other fields you think would be relevant. I want to add an index page to view the submissions and how many times they were clicked. I want to add styling to the page to be branded like AWS"
An AWS Branded front-end, backed by a SQLLite DB and Express

Here is a fully working three tiered web application. I can submit data and track clicks. Claude

  • Added a database
  • Created a table schema (using Prisma ORM)
  • Ran a database migration
  • Installed all the dependencies it needed
  • Created all the API routes it needed
  • Hosted the express server
  • Hooked up the front-end to talk to the API and enabled all the functionality.

The demonstrator that I added, including the click count for when I tested the view demo button.

The form for submitting the data was also validated.

I then asked:

“now replace the components of the frontend with cloudscape components and styling. also convert the frontend to typescript“
The beginning of the conversion to typescript. Starting the the installation of the new component library I asked it use.

Which it did.

“can you add a cloudscape site navigation with the submit demo and browse demo pages within”
The expected Cloudscape side navigation, with full cloudscape design on all other components.
can you convert the browse demos "cards" to a table instead.
A full conversion of the Prototype cards into table format.

Not only did it convert the cards to a table, it also added searching, filtering and pagination settings for the table.

can you split the description into its own column and allow all of the text in that column to wrap so it can be read.
The final form after a small tweak suggestion.

In this instance, it understood I wanted a separate column for the description, although the text wrapping didn't occur.

This is the summary of all the actions it did:

YOUR REQUESTS:

  • Create a basic website using React VITE framework with a landing page for "Prototyping demos for everyone"
  • Add a Cloudscape site navigation
  • Convert the browse demos "cards" to a table
  • Split the description into its own column with text wrapping
  • Restart and verify the application

ACTIONS TAKEN:

Initial Setup:

  • Created new Vite + React project
  • Installed Node.js and necessary dependencies
  • Set up project structure
  • Created backend server with Express and TypeScript
  • Set up SQLite database with Prisma ORM

Frontend Development (First Phase):

  • Created main App component
  • Implemented SubmitPrototype component
  • Implemented PrototypeList component (with cards)
  • Added form validation
  • Added image upload functionality

Backend Development:

  • Set up Express server with TypeScript
  • Created Prisma schema for prototypes
  • Implemented API endpoints:
  • GET /api/prototypes
  • POST /api/prototypes
  • POST /api/prototypes/:id/click
  • Added file upload handling
  • Added click tracking

AWS Cloudscape Integration:

  • Added Cloudscape components
  • Implemented AWS-style navigation:
  • Top navigation with logo
  • Side navigation panel
  • Breadcrumb navigation
  • Added AWS styling and themes

Table Conversion:

  • Converted cards view to Cloudscape Table component
  • Added columns:
  • Title (clickable link)
  • Description (with text wrapping)
  • Author
  • Clicks
  • Tags
  • Actions
  • Added table features:
  • Sorting
  • Filtering
  • Column preferences
  • Pagination

AWS Styling:

  • Implemented AWS color palette
  • Added AWS typography
  • Implemented AWS spacing guidelines
  • Added AWS component styling

Final Configuration & Testing:

  • Set up proper TypeScript configuration
  • Added proper routing
  • Implemented error handling
  • Set up development servers
  • Tested full functionality

File Structure Created:

demo-sharing-platform/
├── src/
│   ├── components/
│   │   ├── PrototypeList.tsx
│   │   └── SubmitPrototype.tsx
│   ├── types/
│   │   └── prototype.ts
│   ├── styles/
│   │   ├── variables.css
│   │   └── App.css
│   ├── App.tsx
│   └── main.tsx
├── server/
│   ├── src/
│   │   └── index.ts
│   ├── prisma/
│   │   └── schema.prisma
│   └── uploads/
└── package.json        

Key Features Implemented:

  • Prototype submission form
  • File upload handling
  • Click tracking
  • Sorting and filtering
  • Column customization
  • Pagination
  • AWS-style navigation
  • AWS-style table

AWS Cloudscape Components Used:

  • AppLayout
  • TopNavigation
  • SideNavigation
  • Table
  • Form
  • Button
  • Badge
  • PropertyFilter
  • TextFilter
  • Pagination
  • CollectionPreferences

The final result is a full-stack application that:

  • Uses React + Vite for the front-end
  • Uses Express + TypeScript for the back-end
  • Implements AWS Cloudscape design system
  • Provides prototype submission and browsing
  • Includes AWS-style table with all standard features
  • Follows AWS design guidelines and best practices

All components are properly typed with Typescript, follow AWS design patterns, and provide a consistent user experience similar to the AWS Console.

All throughout this process there were instances where Claude ran into issues such as types and missing files. Claude then made decisions based on those errors, such as rewriting files, adding missing directories, installing dependency packages etc.

Having an issue with node not being installed, then installing node.

Some Small issues

Throughout the process there were a couple of time where there were some issues with the bot interacting with the virtual environment.

  • In one instance, the bash terminal would only return a timeout response (immediately), the bot tried very hard to debug issues that would possibly cause a timeout, but ultimately by virtual environment had fallen over, I don’t put the fault on Claude here. This was an issue specifically with my environment and not something that Claude had caused.
  • The bot wouldn’t shut down old development servers before starting a new one, after a small prompt it then did this.
  • I asked it to write integration tests, which it did, running them in the virtual environment was problematic though. Not something that I would worry too much about either.

Conclusion

The introduction of Computer Use functionality in Claude Sonnet 3.5 V2 represents a significant leap forward in AI capabilities. Through my testing, I've witnessed firsthand how Claude can navigate complex development tasks, from simple image manipulation to building a full-stack web application with AWS Cloudscape components. What's particularly impressive is not just the ability to execute commands, but Claude's capacity to problem-solve, debug, and pivot when encountering obstacles.

While there were some minor issues with the virtual environment and server management, these were largely environmental rather than limitations of Claude itself. The AI demonstrated remarkable adaptability, installing missing dependencies, handling errors, and making informed decisions throughout the development process.

For developers and technical professionals, this technology represents both an opportunity and a challenge. It's clear that AI tools like Claude's Computer Use functionality can significantly accelerate development workflows.

As this technology continues to evolve, it will be fascinating to see how it shapes the future of software development and technical problem-solving. For now, it's clear that we're witnessing a transformative moment in how we interact with and utilize AI in practical, hands-on development scenarios.

Amber Coffey

Head of ANZ Public Sector Enterprise Support @ Amazon Web Services

4 个月

Thank you, I really enjoyed you're article. It helped me understand the practical applications of the computer use update.

Ilya Gvozdenko

Helping public sector customers benefit from cloud adoption

4 个月

Great write up Joshua ??

要查看或添加评论,请登录

社区洞察

其他会员也浏览了