登录查看更多内容

Introducing Anthropic's Claude 3.5 Sonnet, and Claude 3.5 Haiku

Aditi Khare

AWS & AI Research Specialist-Principal Machine Learning Scientist & AI Architect | IIM-A | Author | AI Research [Portfolio] Build Production-Grade AI Products from Scratch | Vision Transformers??Open-Source Contributor

发布日期: 2024年10月23日

+ 关注

#ai #airesearchpapers #genai #claude #anthropic

For more information on AI Research Papers you can visit my Github Profile -

https://github.com/aditikhare007/AI_Research_Junction_Aditi_Khare

For Receving latest updates on Latest Advancements in AI Research Papers Summaries @Generative AI @Quantum AI @GPUs Optimzation @Deep Learning @Vision you can subscribe to my AI Research Papers Summaries Newsletter using below link -

https://www.dhirubhai.net/newsletters/7152631955203739649/

Thank you & Happy Reading !!

Claude 3.5 Sonnet - Software Engineering Skills

The updated Claude 3.5 Sonnet shows wide-ranging improvements on industry benchmarks, with particularly strong gains in agentic coding and tool use tasks. On coding, it improves performance on SWE-bench Verified from 33.4% to 49.0%, scoring higher than all publicly available models—including reasoning models like OpenAI o1-preview and specialized systems designed for agentic coding. It also improves performance on TAU-bench, an agentic tool use task, from 62.6% to 69.2% in the retail domain, and from 36.0% to 46.0% in the more challenging airline domain. The new Claude 3.5 Sonnet offers these advancements at the same price and speed as its predecessor.

Early customer feedback suggests the upgraded Claude 3.5 Sonnet represents a significant leap for AI-powered coding. GitLab, which tested the model for DevSecOps tasks, found it delivered stronger reasoning (up to 10% across use cases) with no added latency, making it an ideal choice to power multi-step software development processes.

Cognition uses the new Claude 3.5 Sonnet for autonomous AI evaluations, and experienced substantial improvements in coding, planning, and problem-solving compared to the previous version. The Browser Company, in using the model for automating web-based workflows, noted Claude 3.5 Sonnet outperformed every model they’ve tested before.

As part of our continued effort to partner with external experts, joint pre-deployment testing of the new Claude 3.5 Sonnet model was conducted by the US AI Safety Institute (US AISI) and the UK Safety Institute (UK AISI).

upgraded Claude 3.5 Sonnet is also been evaluated for catastrophic risks and found that the ASL-2 Standard, as outlined in our Responsible Scaling Policy, remains appropriate for this model.

Claude 3.5 Haiku - State-of-the-art meets affordability and speed

Claude 3.5 Haiku is the next generation of our fastest model. For the same cost and similar speed to Claude 3 Haiku, Claude 3.5 Haiku improves across every skill set and surpasses even Claude 3 Opus, the largest model in our previous generation, on many intelligence benchmarks. Claude 3.5 Haiku is particularly strong on coding tasks. For example, it scores 40.6% on SWE-bench Verified, outperforming many agents using publicly available state-of-the-art models—including the original Claude 3.5 Sonnet and GPT-4o.

With low latency, improved instruction following, and more accurate tool use, Claude 3.5 Haiku is well suited for user-facing products, specialized sub-agent tasks, and generating personalized experiences from huge volumes of data—like purchase history, pricing, or inventory records.

HackerRank 5 个月前

OpenAI’s Next Big Thing, Templates for Anthropic, an…

HackerRank 1 个月前

The elevation of human work

Reid Hoffman 2 年前

Claude 3.5 Haiku will be made available later this month across our first-party API, Amazon Bedrock, and Google Cloud’s Vertex AI—initially as a text-only model and with image input to follow.

Teaching Claude to navigate computers, responsibly -

With computer use, we're trying something fundamentally new. Instead of making specific tools to help Claude complete individual tasks, we're teaching it general computer skills—allowing it to use a wide range of standard tools and software programs designed for people. Developers can use this nascent capability to automate repetitive processes, build and test software, and conduct open-ended tasks like research.

To make these general skills possible, we've built an API that allows Claude to perceive and interact with computer interfaces. Developers can integrate this API to enable Claude to translate instructions (e.g., “use data from my computer and online to fill out this form”) into computer commands (e.g. check a spreadsheet; move the cursor to open a web browser; navigate to the relevant web pages; fill out a form with the data from those pages; and so on). On OSWorld, which evaluates AI models' ability to use computers like people do, Claude 3.5 Sonnet scored 14.9% in the screenshot-only category—notably better than the next-best AI system's score of 7.8%. When afforded more steps to complete the task, Claude scored 22.0%.

While we expect this capability to improve rapidly in the coming months, Claude's current ability to use computers is imperfect. Some actions that people perform effortlessly—scrolling, dragging, zooming—currently present challenges for Claude and we encourage developers to begin exploration with low-risk tasks. Because computer use may provide a new vector for more familiar threats such as spam, misinformation, or fraud, we're taking a proactive approach to promote its safe deployment. We've developed new classifiers that can identify when computer use is being used and whether harm is occurring. You can read more about the research process behind this new skill, along with further discussion of safety measures, in our post on developing computer use.

Upgraded Claude 3.5 Sonnet is now available for all users - Starting today, developers can build with the computer use beta on the Anthropic API, Amazon Bedrock, and Google Cloud’s Vertex AI. The new Claude 3.5 Haiku will be released later this month.

Amazon Bedrock - https://aws.amazon.com/bedrock/claude/

Vertex AI -https://cloud.google.com/blog/products/ai-machine-learning/announcing-anthropics-claude-3-5-sonnet-on-vertex-ai-providing-more-choice-for-enterprises

Reference Reading Links -

Anthropic Blog - https://www.anthropic.com/news/3-5-models-and-computer-use

Claude | Computer use for automating operations Demo - https://youtu.be/ODaHJzOyVCQ

Introducing Anthropic's Claude 3.5 Sonnet, and Claude 3.5 Haiku

Aditi Khare

AWS & AI Research Specialist-Principal Machine Learning Scientist & AI Architect | IIM-A | Author | AI Research [Portfolio] Build Production-Grade AI Products from Scratch | Vision Transformers??Open-Source Contributor

Claude 3.5 Sonnet - Software Engineering Skills

Claude 3.5 Haiku - State-of-the-art meets affordability and speed

领英推荐

Teaching Claude to navigate computers, responsibly -

AI Research Junction

1,564 位关注者

更多精彩文章

社区洞察

其他会员也浏览了

Top Landing AI Highlights of 2023

ODSC’s AI Weekly Recap: Week of July 5th

Curious AI 45

Issue #296 - The ML Engineer ??

?? Devin AI has unlocked a new software development era, an overview of the latest LLMs, and the gist of my book

GenAI Weekly — Edition 34

Building a GenAI/LLM app on AWS with Anthropic Claude

Issue #200 - THE ML ENGINEER ??

Issue #208 - THE ML ENGINEER???

Build a Retrieval Augmented System (RAG) system in just 4 lines of code!

Claude 3.5 Sonnet - Software Engineering Skills

Claude 3.5 Haiku - State-of-the-art meets affordability and speed

领英推荐

Teaching Claude to navigate computers, responsibly -

AI Research Junction

1,564 位关注者

OpenAI's AI Powered Search Engine Into ChatGPT

2024年11月1日

OpenAI Introduces Swarm, a Framework for Building Multi-Agent Systems

2024年10月12日

Architecture Search Framework for Inference-Time Techniques & Designing Priors for Better Few-Shot Image Synthesis

2024年10月7日

Meta's Llama 3.2 - Edge AI & Vision with Open, Customizable Models

2024年9月28日

Agents in Software Engineering-Survey, Landscape, and Vision & Qwen2.5-Coder

2024年9月24日

Anthropic Introduces Contextual Retrieval Using Prompt Caching & Contextual Embeddings & Reranking Techniques

2024年9月23日

Google's Training Language Models to Self-Correct via Reinforcement Learning & Iteration of Thought - Autonomous Large Language Model Reasoning

2024年9月22日

Learning to Reason with LLMs - Introducing OpenAI o1

2024年9月14日

LongCite - Enabling LLMs to Generate Fine-grained Citations in Long-context QA

2024年9月10日

Role of RAG Noise in Large Language Models & Strategic Chain-of-Thought

2024年9月9日

社区洞察

其他会员也浏览了

Top Landing AI Highlights of 2023

ODSC’s AI Weekly Recap: Week of July 5th

Curious AI 45

Issue #296 - The ML Engineer ??

?? Devin AI has unlocked a new software development era, an overview of the latest LLMs, and the gist of my book

GenAI Weekly — Edition 34

Building a GenAI/LLM app on AWS with Anthropic Claude

Issue #200 - THE ML ENGINEER ??

Issue #208 - THE ML ENGINEER???

Build a Retrieval Augmented System (RAG) system in just 4 lines of code!