
Anthropic has introduced Claude Opus 4.1, the newest flagship in the Claude family, delivering notable improvements in real-world coding, agentic search, and reasoning. On SWE-bench Verified, Opus 4.1 achieves 74.5% accuracy, marking a clear jump over prior models, including Claude Sonnet 3.7’s 62.3% and reported Opus 4 baselines.
The model has been optimized for tighter detail tracking and more reliable long-horizon research and data analysis. Enhanced agentic search, combined with stronger multi-step planning and tool use, aims to accelerate software engineering workflows. Anthropic recommends Opus 4.1 for demanding scenarios where accuracy and capability matter more than speed or cost—such as complex AI agents, advanced coding projects, deep research tasks, and autonomous long-horizon work.
Early enterprise feedback underscores the gains: GitHub reports notable advances in multi-file code refactoring; Rakuten Group highlights Opus 4.1’s ability to pinpoint exact fixes in large codebases without unnecessary changes or new bugs; and developer tool Windsurf observes a one-standard-deviation jump over Opus 4 on its junior developer benchmark—comparable to the leap from Sonnet 3.7 to Sonnet 4.
Opus 4.1 is available to paid Claude users and within Claude Code, and can also be accessed via the Anthropic API, Amazon Bedrock, and Google Cloud Vertex AI. Pricing remains unchanged from Opus 4. Developers can switch by targeting the “claude-opus-4-1-20250805” model ID. Anthropic also outlines its evaluation methodology, noting where extended thinking was or wasn’t used across benchmarks like SWE-bench and TAU-bench, and signals that substantially larger model upgrades are planned in the coming weeks.
On the user side, community notes indicate that Claude Code subscribers can track true usage and costs with tools like ccusage, while basic plan users report fewer token-cap issues after adopting more aggressive context-clearing habits. As adoption grows, Anthropic positions Opus 4.1 as a safer and more reliable response to scaling demands, with an explicit emphasis on trust and safety.