登录查看更多内容

First contact with Claude 3.5 V2 Computer Use

Joshua Toth

Senior Prototyping Engineer

发布日期: 2024年10月23日

+ 关注

This article represents my views and opinions and do not represent my employer.

Specifically using the Computer use functionality.

tl;dr it's amazing, it can code, debug, rationalize and work around issues.

What’s the focus here?

On 22nd of October the announcement of the the new Claude Sonnet 3.5 V2, read here. Inside this announcement was a public beta for Computer Use; tooling for Claude that gives it access to desktop actions, text editing and bash commands. In this article I dive into: general web usage, adding a mustache to a goat and developing an entire three-tiered web application using React, Vite, Express, SQLLite, Prisma and Typescript. All just by using a handful of vague prompts.

"Claude 3.5 Sonnet now offers computer use capabilities in Amazon Bedrock in public beta, allowing Claude to perceive and interact with computer interfaces."

As of this release, Claude now has:

Computer tool – This tool can receive as input a screenshot and a goal and returns a description of the mouse and keyboard actions that should be performed to achieve that goal. For example, this tool can ask to move the cursor to a specific position, click, type, and take screenshots.
Text editor tool – Using this tool, the model can ask to perform operations like viewing file contents, creating new files, replacing text, and undoing edits.
Bash tool – This tool returns commands that can be run on a computer system to interact at a lower level as a user typing in a terminal.

Claude already was able to review images using its multi-modal LLM capabilities, it uses screenshots and the bash output to review its actions and begin on the next action.

Getting started with the functionality.

Anthropic has released a quick start repository here for computer use. I utilized the bedrock functionality and launched the demo with no issues at all. (after first enabling Claude Sonnet 3.5 v2 in bedrock).

In the quick start you are presenting with a Linux virtual machine and a chat window.

A virtual environment that can directly convert Claude responses to actions.

I got started by asking it to complete a simple task.

Search up Joshua Toth at AWS and tell me about him.

Claude:

Opened Firefox
Did a google search on me
Found my LinkedIn
Found my personal website and medium page.
Visited all of those pages and created a summary.

An early summary directly from the google results page, before diving deeper (unprompted) into the listed links.

It did this relatively quickly. It also tried to visit LinkedIn but got blocked with a login screen. It then visited several other websites including my personal website and my medium page.

Something More complicated

Next up I asked it to add a mustache to the picture of a goat.

The chatbot then proceeded to:

Visit google and do an image search on "goat"
Then it decided that the images weren't great and searched "goat side profile"
Right clicked and saved an image as "goat.jpg.jpeg" it actually typed goat.jpg but the text only overwrote the content before the .jpeg file type from google.
Tried to launch 'gimp' (an image editor) from bash
Gimp was not installed so it then ran apt-update and installed gimp
Then it attempted to open 'goat.jpg' which is the incorrect name (small mistake here)
It realised there was an issue with gimp and then installed and used imagemagick
It used imagemagick to create a mustache
It used imagemagick to then add the mustache to the goat.jpg
It couldn't find that exact filename so it used curl to re-download the image it had found on google (without reusing the browser)
Then it used imagemagick again to add the mustache
It displayed the image in a browser.

Here is the image:

Technically an image of a goat with a mustache on it. Goat source:

The mustache wasn't right so it then tried 3 times to place the mustache in the correct position, although in the end the mustache wasn't exactly where a human would have put it.

Wow.

From installing missing programs to debugging errors and pivoting approach I was blown away. Even if it did sort of miss the point of adding a mustache to the goat.

Developer actions

I'm a developer, and I wanted to check my "I'm in danger" radar a little. Here is my initial prompt on the next stage:

"I would like to create a basic website using the react VITE framework, have it running in developer mode. I would like to display just a simple landing page for a product called: "Prototyping demos for everyone" where users eventually will be able to share their prototype demos through url links that they submit. I only need the landing page functionality and a starting application I can build on."

Cool, a landing page running on a locally hosted developer environment:

The initial VITE webapp, with the title I gave it and extra content that was appropriate

I then added more complicated requirements

"I would like to add an express server using typescript that acts as the API for the functionality I would like to allow for users to upload an image, a prototype description, a url and other fields you think would be relevant. I want to add an index page to view the submissions and how many times they were clicked. I want to add styling to the page to be branded like AWS"

An AWS Branded front-end, backed by a SQLLite DB and Express

Here is a fully working three tiered web application. I can submit data and track clicks. Claude

Added a database
Created a table schema (using Prisma ORM)
Ran a database migration
Installed all the dependencies it needed
Created all the API routes it needed
Hosted the express server
Hooked up the front-end to talk to the API and enabled all the functionality.

The demonstrator that I added, including the click count for when I tested the view demo button.

The form for submitting the data was also validated.

I then asked:

“now replace the components of the frontend with cloudscape components and styling. also convert the frontend to typescript“

The beginning of the conversion to typescript. Starting the the installation of the new component library I asked it use.

Which it did.

领英推荐

Jit Tech News 11

Jit Team 1 年前

VPP Configuration - Part1

Pim van Pelt 1 年前

Latest in the World of .NET Technology- The .NET 8

Rahul Mathur 1 年前

“can you add a cloudscape site navigation with the submit demo and browse demo pages within”

The expected Cloudscape side navigation, with full cloudscape design on all other components.

can you convert the browse demos "cards" to a table instead.

A full conversion of the Prototype cards into table format.

Not only did it convert the cards to a table, it also added searching, filtering and pagination settings for the table.

can you split the description into its own column and allow all of the text in that column to wrap so it can be read.

The final form after a small tweak suggestion.

In this instance, it understood I wanted a separate column for the description, although the text wrapping didn't occur.

This is the summary of all the actions it did:

YOUR REQUESTS:

Create a basic website using React VITE framework with a landing page for "Prototyping demos for everyone"
Add a Cloudscape site navigation
Convert the browse demos "cards" to a table
Split the description into its own column with text wrapping
Restart and verify the application

ACTIONS TAKEN:

Initial Setup:

Created new Vite + React project
Installed Node.js and necessary dependencies
Set up project structure
Created backend server with Express and TypeScript
Set up SQLite database with Prisma ORM

Frontend Development (First Phase):

Created main App component
Implemented SubmitPrototype component
Implemented PrototypeList component (with cards)
Added form validation
Added image upload functionality

Backend Development:

Set up Express server with TypeScript
Created Prisma schema for prototypes
Implemented API endpoints:
GET /api/prototypes
POST /api/prototypes
POST /api/prototypes/:id/click
Added file upload handling
Added click tracking

AWS Cloudscape Integration:

Added Cloudscape components
Implemented AWS-style navigation:
Top navigation with logo
Side navigation panel
Breadcrumb navigation
Added AWS styling and themes

Table Conversion:

Converted cards view to Cloudscape Table component
Added columns:
Title (clickable link)
Description (with text wrapping)
Author
Clicks
Tags
Actions
Added table features:
Sorting
Filtering
Column preferences
Pagination

AWS Styling:

Implemented AWS color palette
Added AWS typography
Implemented AWS spacing guidelines
Added AWS component styling

Final Configuration & Testing:

Set up proper TypeScript configuration
Added proper routing
Implemented error handling
Set up development servers
Tested full functionality

File Structure Created:

demo-sharing-platform/
├── src/
│   ├── components/
│   │   ├── PrototypeList.tsx
│   │   └── SubmitPrototype.tsx
│   ├── types/
│   │   └── prototype.ts
│   ├── styles/
│   │   ├── variables.css
│   │   └── App.css
│   ├── App.tsx
│   └── main.tsx
├── server/
│   ├── src/
│   │   └── index.ts
│   ├── prisma/
│   │   └── schema.prisma
│   └── uploads/
└── package.json

Key Features Implemented:

Prototype submission form
File upload handling
Click tracking
Sorting and filtering
Column customization
Pagination
AWS-style navigation
AWS-style table

AWS Cloudscape Components Used:

AppLayout
TopNavigation
SideNavigation
Table
Form
Button
Badge
PropertyFilter
TextFilter
Pagination
CollectionPreferences

The final result is a full-stack application that:

Uses React + Vite for the front-end
Uses Express + TypeScript for the back-end
Implements AWS Cloudscape design system
Provides prototype submission and browsing
Includes AWS-style table with all standard features
Follows AWS design guidelines and best practices

All components are properly typed with Typescript, follow AWS design patterns, and provide a consistent user experience similar to the AWS Console.

All throughout this process there were instances where Claude ran into issues such as types and missing files. Claude then made decisions based on those errors, such as rewriting files, adding missing directories, installing dependency packages etc.

Having an issue with node not being installed, then installing node.

Some Small issues

Throughout the process there were a couple of time where there were some issues with the bot interacting with the virtual environment.

In one instance, the bash terminal would only return a timeout response (immediately), the bot tried very hard to debug issues that would possibly cause a timeout, but ultimately by virtual environment had fallen over, I don’t put the fault on Claude here. This was an issue specifically with my environment and not something that Claude had caused.
The bot wouldn’t shut down old development servers before starting a new one, after a small prompt it then did this.
I asked it to write integration tests, which it did, running them in the virtual environment was problematic though. Not something that I would worry too much about either.

Conclusion

The introduction of Computer Use functionality in Claude Sonnet 3.5 V2 represents a significant leap forward in AI capabilities. Through my testing, I've witnessed firsthand how Claude can navigate complex development tasks, from simple image manipulation to building a full-stack web application with AWS Cloudscape components. What's particularly impressive is not just the ability to execute commands, but Claude's capacity to problem-solve, debug, and pivot when encountering obstacles.

While there were some minor issues with the virtual environment and server management, these were largely environmental rather than limitations of Claude itself. The AI demonstrated remarkable adaptability, installing missing dependencies, handling errors, and making informed decisions throughout the development process.

For developers and technical professionals, this technology represents both an opportunity and a challenge. It's clear that AI tools like Claude's Computer Use functionality can significantly accelerate development workflows.

As this technology continues to evolve, it will be fascinating to see how it shapes the future of software development and technical problem-solving. For now, it's clear that we're witnessing a transformative moment in how we interact with and utilize AI in practical, hands-on development scenarios.

Amber Coffey

Head of ANZ Public Sector Enterprise Support @ Amazon Web Services

4 个月

Thank you, I really enjoyed you're article. It helped me understand the practical applications of the computer use update.

1 次回应

Ilya Gvozdenko

Helping public sector customers benefit from cloud adoption

Great write up Joshua ??

2 次回应

查看更多评论

First contact with Claude 3.5 V2 Computer Use

Joshua Toth

Senior Prototyping Engineer

What’s the focus here?

Getting started with the functionality.

Something More complicated

Developer actions

领英推荐

This is the summary of all the actions it did:

Some Small issues

Conclusion

社区洞察

其他会员也浏览了

HackTheBox: EarlyAccess - Detailed Walkthrough

What happens when you type google.com in your browser and press Enter !!!

This Website Is No Longer Running As The Root User

Unleashing the Power of Select Syscall: A Game-Changer for Multiplexing in Servers ????

Routing Fundamental Concepts

What happens when you type google.com in your browser and press Enter

Why Http/2 is fast?

What’s New in .NET 9 and C# 13: Upgrades and Updates You’ll Love

What happens when you type google.com in your browser and press enter

Creating stronger foundations