Anthropic's Claude 3.5 Sonnet Outperforms GPT-4.o Across Multiple Benchmarks

Divyang Garg

President New Technology | Sr. Solutions Architect | Data Analyst & Engineering | Cloud | IoT | Big Data | AI/ML | Reporting

发布日期: 2024年6月26日

Anthropic has announced the launch of Claude 3.5 Sonnet, their latest mid-tier model that exceeds competitors and even outperforms their current top-tier model, Claude 3 Opus, across various assessments. Available now at no cost on Claude.ai and the Claude iOS app, with expanded rate limits for subscribers to Claude Pro and Team plans, Claude 3.5 Sonnet is also accessible via the Anthropic API, Amazon Bedrock, and Google Cloud’s Vertex AI. Pricing for the model starts at $3 per million input tokens and $15 per million output tokens, featuring a context window of 200K tokens.

Anthropic claims that Claude 3.5 Sonnet "establishes new industry benchmarks for graduate-level reasoning (GPQA), undergraduate-level knowledge (MMLU), and coding proficiency (HumanEval)." The model demonstrates enhanced capabilities in understanding nuances, humor, and complex instructions, while excelling in generating high-quality content with a natural tone.

Operating at double the speed of Claude 3 Opus, Claude 3.5 Sonnet is particularly suited for complex tasks such as context-sensitive customer support and multi-step workflow orchestration. In internal evaluations, it solved 64% of problems in agentic coding, significantly outperforming Claude 3 Opus, which managed 38%.

The model also showcases improved vision capabilities, surpassing Claude 3 Opus on standard vision benchmarks, especially in tasks requiring visual reasoning like interpreting charts and transcribing text from imperfect images—features valuable in industries such as retail, logistics, and financial services.

Anthropic has introduced Artifacts on Claude.ai alongside the model launch, enhancing user interaction by allowing real-time viewing, editing, and collaboration on content generated by Claude. Despite its enhanced intelligence, Claude 3.5 Sonnet maintains Anthropic’s rigorous commitment to safety and privacy, ensuring that generative models are not trained on user-submitted data without explicit permission.

Looking ahead, Anthropic plans to expand the Claude 3.5 model family with upcoming releases like Claude 3.5 Haiku and Claude 3.5 Opus later this year. The company is also focused on developing new modalities and features to support various business use cases, including integrations with enterprise applications and a memory feature for personalized user experiences.

Anthropic's Claude 3.5 Sonnet Outperforms GPT-4.o Across Multiple Benchmarks

Divyang Garg

President New Technology | Sr. Solutions Architect | Data Analyst & Engineering | Cloud | IoT | Big Data | AI/ML | Reporting

更多精彩文章

社区洞察

Microsoft has provided an in-depth explanation of an AI jailbreak known as 'Skeleton Key

2024年7月1日

How Apple Collaborated with Google to Train Its AI Models

2024年6月18日

Meet Apple Intelligence: The Advanced AI Revolutionizing for an Enhanced User Experience

2024年6月14日

Introducing the ChatGPT Prompt Generator: Unlocking the potential of AI-powered conversations

2024年6月13日

Google Refines AI Overviews in Response to an Uneven Initial Launch

2024年6月12日

Google modifies AI Overviews following a turbulent rollout

2024年6月10日

The EU inaugurates a bureau dedicated to enforcing the AI Act and nurturing innovation

2024年6月7日

Elon Musk's AI venture, xAI set to $6 billion to rival OpenAI in the AI competition

2024年6月5日

The Emergence of Intelligent Automation as a Key Competitive Advantage

2024年5月31日

The Future of Virtual and Augmented Reality in Education

2024年5月28日

社区洞察