16d (edited) • News
Sonnet 4.6 Released! — 1M Context Window
Anthropic released Sonnet 4.6 today. Here's what changed and why it's worth paying attention to.
The biggest jump: Novel problem-solving
ARC-AGI-2 measures how well a model can reason through problems it hasn't seen before — generalization, not memorization.
  • Sonnet 4.5: 13.6%
  • Sonnet 4.6: 58.3%
  • Increase: +44.7 percentage points
That's the largest single-generation improvement in the table by a wide margin.
Agentic benchmarks
The benchmarks most relevant to tool use and automation all improved significantly:
  • Agentic search (BrowseComp): 43.9% → 74.7% (+30.8pp)
  • Scaled tool use (MCP-Atlas): 43.8% → 61.3% (+17.5pp)
  • Agentic computer use: 61.4% → 72.5% (+11.1pp)
  • Terminal coding: 51.0% → 59.1% (+8.1pp)
Sonnet 4.6 vs Opus 4.5
Worth noting — Sonnet 4.6 now outperforms Opus 4.5 on several benchmarks:
  • Novel problem-solving: 58.3% vs 37.6%
  • Agentic search: 74.7% vs 67.8%
  • Agentic computer use: 72.5% vs 66.3%
Sonnet is the smaller, cheaper model tier — so this shifts the cost/performance equation for anyone building agentic workflows.
What this means practically
If you're building with tool use, MCP integrations, or multi-step AI workflows, the MCP-Atlas and BrowseComp improvements are the ones to watch. Models that reliably use tools and follow through on multi-step tasks open up a lot of what was previously too brittle to ship.
1
0 comments
Wes Odom
5
Sonnet 4.6 Released! — 1M Context Window
Vibe Coders
skool.com/vibe-coders
Master Vibe Coding in our supportive developer community. Learn AI-assisted coding with fellow coders, from beginners to experts. Level up together!🚀
Leaderboard (30-day)
Powered by