Anthropic releases Claude Sonnet 4.5 with advanced coding and agent capabilities

0
74

Anthropic releases Claude Sonnet 4.5 with advanced coding and agent capabilities

AI company Anthropic has released Claude Sonnet 4.5, a new flagship model that the company positions as its most capable for coding, building complex AI agents, and using computer systems, with significant gains in reasoning and mathematics.

The new model is available now and is accompanied by a new developer toolkit and major updates across the Claude product line.

Sonnet 4.5 features that stand out

According to Anthropic’s blog post, the model achieves state-of-the-art performance on the SWE-bench Verified evaluation, a benchmark that measures real-world software coding abilities. It also shows improved performance on the OSWorld benchmark, which tests an AI model’s ability to perform real-world tasks on a computer, such as navigating websites and filling spreadsheets.

The company also reports that experts in finance, law, medicine, and STEM found Sonnet 4.5 to have dramatically better domain-specific knowledge and reasoning compared to previous models.

New tools for developers: The Claude Agent SDK

Alongside the new model, Anthropic has launched the Claude Agent SDK. This software development kit provides developers with the same infrastructure the company uses to power its Claude Code product, enabling them to build their own custom AI agents. The SDK is designed to solve common challenges in agent development, such as managing memory for long-running tasks, handling permission systems, and coordinating subagents working toward a shared goal.

Product updates across the Claude ecosystem

The launch of Sonnet 4.5 includes several significant upgrades to existing Claude products.

  • Claude Code: Introduces checkpoints that allow users to save progress and roll back to a previous state, a refreshed terminal interface, and a native VS Code extension.
  • Claude API: Adds a new context editing feature and a memory tool to help agents run longer and handle more complex tasks.
  • Claude Apps: Users on paid plans can now execute code and create files, such as spreadsheets, slides, and documents, directly within their conversations.
  • Claude for Chrome Extension: Now available to Max users who previously joined the waitlist.

Focus on safety and alignment

Anthropic states that Claude Sonnet 4.5 is its most aligned model to date, with improvements in reducing undesirable behaviors like deception and sycophancy. The model is released under the company’s AI Safety Level 3 (ASL-3) framework, which includes safeguards like classifiers designed to detect potentially dangerous inputs and outputs, particularly those related to chemical, biological, radiological, and nuclear (CBRN) weapons.

Imagine with Claude

For a limited time, Anthropic is offering a research preview called “Imagine with Claude” for its Max subscribers. In this demonstration, the model generates software in real time in response to user requests, with no prewritten code. This preview is designed to showcase the capabilities of Son-net 4.5 when combined with the right infrastructure.

Availability and pricing

Claude Sonnet 4.5 is available now through the Claude API. The pricing is the same as the previous Claude Sonnet 4 model, at $3 per million input tokens and $15 per million output tokens.

Anthropic recommends upgrading to Sonnet 4.5 for all uses, as it provides improved performance for the same cost.

Claude Sonnet 4.5 vs ChatGPT-5: Which one should you use for your next project?

The release of Claude Sonnet 4.5 has intensified the competition at the forefront of artificial intelligence, directly challenging GPT-5.

While both models represent advanced AI development, they showcase distinct strengths, particularly in the realms of coding, agentic capabilities, and overall performance.

At a glance: Key differences

Feature Claude Sonnet 4.5 GPT-5
Primary strength Agentic coding, computer use, and long-duration autonomous tasks. Unified intelligence, advanced reasoning, and multimodal capabilities.
SWE-bench Verified 77.2% (Standard), 82% (High-compute). 72.8%.
OSWorld Benchmark 61.4%. Not specified, but Sonnet 4.5 leads the chart.
Developer tools Claude Agent SDK, native VS Code extension, Claude Code with checkpoints. Accessed via API and integrated into products like ChatGPT and Microsoft Copilot.
Unique features Can operate autonomously for over 30 hours. Enhanced safety and alignment features. Unified system that blends multiple AI models. Dynamically adjusts its reasoning approach based on task complexity.

Coding and developer focus

Claude Sonnet 4.5 has been positioned as the “best coding model in the world.” This claim is substantiated by its leading performance on several key benchmarks. On SWE-bench Verified, which measures a model’s ability to solve real-world GitHub issues, Sonnet 4.5 scores an impressive 77.2%, outperforming GPT-5’s 72.8%. With additional computing power, Sonnet 4.5’s score jumps to 82%.

Furthermore, on Terminal-Bench, a test of an AI’s ability to use a command-line interface, Sonnet 4.5 achieved a 50% success rate, significantly ahead of GPT-5’s 43.8%. This suggests that for developers and technical users who need an AI to perform complex, multi-step tasks in a terminal environment, Sonnet 4.5 holds a distinct advantage.

In contrast, GPT-5 is presented as a powerful, general-purpose coding model. While it set new state-of-the-art benchmarks at the time of its release, the specialized focus of Sonnet 4.5 appears to give it an edge in developer-centric tasks.

Agentic capabilities and computer use

A standout feature of Claude Sonnet 4.5 is its ability to function as a long-running autonomous agent. Reports indicate the model can maintain focus and performance on complex tasks for more than 30 hours, a significant increase from previous models. This endurance is crucial for tasks that require sustained effort, such as large-scale code refactoring or in-depth data analysis.

On the OSWorld benchmark, which evaluates an AI’s ability to perform real-world tasks on a computer, Sonnet 4.5 has taken the top spot with a success rate of 61.4%. This proficiency is further demonstrated in its tool use capabilities, where it scored a remarkable 98.0% in the Telecom domain of the τ-bench evaluations, nearly doubling the performance of its predecessor and surpassing GPT-5.

GPT-5, on the other hand, is designed as a unified system that can intelligently switch between different reasoning approaches based on the task’s complexity. This allows it to handle a wide variety of tasks efficiently, but it does not emphasize the same long-duration autonomy as Sonnet 4.5.

Reasoning, math, and general performance

In areas of general reasoning and mathematics, the competition is much closer. On the AIME 2025 high school math competition, Sonnet 4.5 achieved a perfect 100% score when using Python, slightly edging out GPT-5’s 99.6%. For graduate-level reasoning, as measured by the GPQA Diamond benchmark, the models are highly competitive, with GPT-5 holding a slight lead.

Early user reports and hands-on tests suggest that Sonnet 4.5 is noticeably faster…

Featured image credit