Anthropic's Claude 3.5 Sonnet Outperforms GPT-4.o Across Multiple Benchmarks
Divyang Garg
President New Technology | Sr. Solutions Architect | Data Analyst & Engineering | Cloud | IoT | Big Data | AI/ML | Reporting
Anthropic has announced the launch of Claude 3.5 Sonnet, their latest mid-tier model that exceeds competitors and even outperforms their current top-tier model, Claude 3 Opus, across various assessments. Available now at no cost on Claude.ai and the Claude iOS app, with expanded rate limits for subscribers to Claude Pro and Team plans, Claude 3.5 Sonnet is also accessible via the Anthropic API, Amazon Bedrock, and Google Cloud’s Vertex AI. Pricing for the model starts at $3 per million input tokens and $15 per million output tokens, featuring a context window of 200K tokens.
Anthropic claims that Claude 3.5 Sonnet "establishes new industry benchmarks for graduate-level reasoning (GPQA), undergraduate-level knowledge (MMLU), and coding proficiency (HumanEval)." The model demonstrates enhanced capabilities in understanding nuances, humor, and complex instructions, while excelling in generating high-quality content with a natural tone.
Operating at double the speed of Claude 3 Opus, Claude 3.5 Sonnet is particularly suited for complex tasks such as context-sensitive customer support and multi-step workflow orchestration. In internal evaluations, it solved 64% of problems in agentic coding, significantly outperforming Claude 3 Opus, which managed 38%.
The model also showcases improved vision capabilities, surpassing Claude 3 Opus on standard vision benchmarks, especially in tasks requiring visual reasoning like interpreting charts and transcribing text from imperfect images—features valuable in industries such as retail, logistics, and financial services.
Anthropic has introduced Artifacts on Claude.ai alongside the model launch, enhancing user interaction by allowing real-time viewing, editing, and collaboration on content generated by Claude. Despite its enhanced intelligence, Claude 3.5 Sonnet maintains Anthropic’s rigorous commitment to safety and privacy, ensuring that generative models are not trained on user-submitted data without explicit permission.
Looking ahead, Anthropic plans to expand the Claude 3.5 model family with upcoming releases like Claude 3.5 Haiku and Claude 3.5 Opus later this year. The company is also focused on developing new modalities and features to support various business use cases, including integrations with enterprise applications and a memory feature for personalized user experiences.