The Thinking Machine: How Claude 3.7 Sonnet Changes the AI Landscape
David Borish
AI Strategist at Trace3 | Keynote Speaker | 25 Years in Technology & Innovation | NYU Guest Lecturer & AI Mentor | Author of "AI 2024" | Writer at "The AI Spectator"
Anthropic has unveiled Claude 3.7 Sonnet, its most intelligent model to date and the first hybrid reasoning model on the market. Released on February 24, 2025, this groundbreaking update introduces extended thinking capabilities, substantial improvements in coding skills, and a new agentic coding tool called Claude Code.
Extended Thinking: A New Paradigm in AI Reasoning
The standout feature of Claude 3.7 Sonnet is its ability to toggle between standard responses and "extended thinking mode." Unlike conventional AI models, which produce answers with a single pass through their parameters, Claude 3.7 Sonnet can now give itself more time to solve complex problems through multiple, sequential reasoning steps.
"With the new Claude 3.7 Sonnet, users can toggle 'extended thinking mode' on or off, directing the model to think more deeply about trickier questions," Anthropic explains. "And developers can even set a 'thinking budget' to control precisely how long Claude spends on a problem."
What makes this implementation unique is that extended thinking isn't a separate model or strategy—it's the same model applying more cognitive effort when needed, much like humans do when faced with challenging tasks. Through the API, developers have fine-grained control over this thinking process, with the ability to allocate up to 128K tokens for complex reasoning.
Perhaps most impressive is the visibility of Claude's thought process. Users can now observe Claude's step-by-step reasoning in real time, creating unprecedented transparency in AI decision-making. This visible thinking offers several benefits:
Coding Excellence and Claude Code
Claude 3.7 Sonnet demonstrates significant improvements in coding capabilities, establishing itself as "best-in-class for real-world coding tasks" according to early testing by Cursor. The model excels at handling complex codebases, planning code changes, managing full-stack updates, and executing sophisticated agent workflows.
These enhancements are reflected in benchmark performance, with Claude 3.7 Sonnet achieving state-of-the-art results on SWE-bench Verified (which evaluates AI models' ability to solve real-world software issues) and TAU-bench (which tests AI agents on complex real-world tasks with user and tool interactions).
Alongside the model update, Anthropic has introduced Claude Code—a command-line tool for agentic coding available as a limited research preview. Claude Code functions as an active collaborator that can:
Early testing shows Claude Code completing tasks "in a single pass that would normally take 45+ minutes of manual work, reducing development time and overhead."
Anthropic has also expanded GitHub integration to all Claude plans, allowing developers to connect their code repositories directly to Claude for more effective collaboration on fixing bugs, developing features, and building documentation.
Agent Capabilities and Task Performance
Claude 3.7 Sonnet features what Anthropic calls "action scaling"—an improved capability for iterative function calls and environmental interactions. This enhancement allows Claude to allocate more turns, time, and computational power to complex tasks, particularly excelling at computer use tasks where it can issue virtual mouse clicks and keyboard presses.
To demonstrate these capabilities, Anthropic had Claude play Pokémon Red, equipping it with "basic memory, screen pixel input, and function calls to press buttons and navigate around the screen." While previous versions struggled to progress beyond the starting area, Claude 3.7 Sonnet successfully battled three Pokémon Gym Leaders and won their Badges, demonstrating "super effective" strategies and the ability to improve its own capabilities as it progressed.
Safety and Responsible Development
Anthropic maintains its commitment to responsible AI development with Claude 3.7 Sonnet, conducting extensive testing and evaluation to ensure it meets safety, security, and reliability standards. The model operates under Anthropic's AI Safety Level (ASL) 2 standard, with enhanced safety measures for computer use capabilities.
Particularly notable is Claude's improved resistance to "prompt injection" attacks, where malicious third parties might hide secret messages to trick the AI into taking unintended actions. Through new training, system prompts, and a specialized classifier, Claude now prevents these attacks 88% of the time, up from 74% previously.
The model also makes "more nuanced distinctions between harmful and benign requests, reducing unnecessary refusals by 45% compared to its predecessor."
Test-Time Compute Scaling
Beyond extended thinking, Anthropic researchers have been experimenting with parallel test-time compute scaling—sampling multiple independent thought processes and selecting the best one without knowing the true answer ahead of time.
This approach yielded impressive results on the GPQA evaluation (challenging questions on biology, chemistry, and physics). Using the equivalent compute of 256 independent samples, a learned scoring model, and a maximum 64k-token thinking budget, Claude 3.7 Sonnet achieved a GPQA score of 84.8%, including a physics subscore of 96.5%. While this parallel test-time compute scaling isn't available in the current deployment, Anthropic continues to research these methods for future releases.
Availability and Pricing
Claude 3.7 Sonnet is now available on all Claude plans—including Free, Pro, Team, and Enterprise—as well as the Anthropic API, Amazon Bedrock, and Google Cloud's Vertex AI. Extended thinking mode is available on all surfaces except the free Claude tier.
In both standard and extended thinking modes, Claude 3.7 Sonnet maintains the same pricing as its predecessors: $3 per million input tokens and $15 per million output tokens, which includes thinking tokens.
A New Era of AI Reasoning
Claude 3.7 Sonnet represents a significant advancement in AI capabilities, particularly in reasoning, coding, and agentic tasks. By making the thinking process visible and controllable, Anthropic has created a more transparent and versatile AI system that can adapt its cognitive effort to match the complexity of the task at hand.
As Anthropic puts it, Claude 3.7 Sonnet and Claude Code "mark an important step towards AI systems that can truly augment human capabilities. With their ability to reason deeply, work autonomously, and collaborate effectively, they bring us closer to a future where AI enriches and expands what humans can achieve."