Opus 4.7 drops against backdrop of criticism
Anthropic shipped Opus 4.7 under pressure after weeks of community revolt over perceived performance regression in 4.6, including a viral GitHub post from an AMD senior director calling Claude "no longer reliable for complex engineering." The community reception on launch day was notably skeptical despite strong partner testimonials, with Hacker News commenters essentially saying "we'll believe it when we see it."
Key model card deltas to know:
  • Knowledge cutoff jumped from May 2025 → January 2026
  • Vision resolution tripled: 1.25MP → 3.75MP
  • New xhigh effort tier (Opus 4.7 exclusive), now the default in Claude Code
  • Extended thinking budgets removed — a hard API break
  • Sampling parameters removed — another hard break that will catch people off-guard
  • New tokenizer inflates effective token usage up to 35% despite unchanged nominal pricing
The Mythos shadow is the meta-narrative: Anthropic publicly concedes Opus 4.7 trails the unreleased Mythos Preview on every benchmark measured, and explicitly designed 4.7 as a safeguard testbed before any public Mythos rollout under Project Glasswing. That's a remarkable admission to lead a flagship release with.
Developer Reception on Launch Day
Despite pre-release skepticism, early-access partner testimonials — covering Cursor, Devin, Replit, Notion, Vercel, Ramp, and others — were uniformly strong. Aggregated themes from partner feedback:
  • Coding autonomy: Multiple partners reported 10–15% task-success lifts with fewer tool errors and more reliable follow-through on validation steps.
  • Self-correction: Opus 4.7 was consistently praised for catching its own logical faults during planning — not just execution — a behavior change developers called "new".
  • Long-horizon reliability: Devin's team noted it "works coherently for hours, pushes through hard problems rather than giving up". Warp confirmed it "passed Terminal Bench tasks that prior Claude models had failed, and worked through a tricky concurrency bug Opus 4.6 couldn't crack".
  • Dashboard and UI work: One early tester called it "the best model in the world for building dashboards and data-rich interfaces" — a direct reference to the improved vision and creativity.
Tokenizer Cost Concerns
The most substantive technical criticism on Reddit, Hacker News, and dev blogs centered on the new tokenizer. The 1.0–1.35× token inflation means that "same nominal price" is potentially misleading for chat-heavy or high-frequency workloads. Developer blog Hayka Pacha estimated: "For chat workloads where you pay for every token whether the model 'used' them well or not, this is a 10 to 35% cost bump at the same price-per-token".
Reddit's r/claude thread on the day of release was mixed: users acknowledged the performance gains but raised token costs, the removal of extended thinking budgets, and sampling parameter removal as friction points requiring migration work.
X / Twitter Signal
Pre-release, developer account @pankajkumar_dev had correctly leaked the release timing and key features, generating significant anticipation. Post-release, @trq212 noted: "Opus 4.7 is a model I've loved working with in Claude Code. It's more agentic and instruction following but also incredibly smart and creative". The reception on X was broadly positive among active Claude Code users, with the vision upgrade (particularly the XBOW visual-acuity jump) generating notable discussion among security and computer-use developers.
Cybersecurity Safeguards
Opus 4.7 is the first Claude model to ship with real-time cybersecurity safeguards — automated systems that detect and block requests indicating prohibited or high-risk cybersecurity uses. This is explicitly linked to Project Glasswing and the restricted preview of Claude Mythos: Anthropic is using Opus 4.7 as a proving ground for safeguards before any broader Mythos release. Security professionals can apply to the new Cyber Verification Program for legitimate use cases including vulnerability research, penetration testing, and red-teaming.
Anthropic even noted it "experimented with efforts to differentially reduce" Opus 4.7's cyber capabilities relative to Mythos during training — a notable admission that capability suppression was an intentional design choice.
Where Opus 4.7 Sits vs. Mythos
Anthropic has been unusually transparent: Opus 4.7 is "broadly less capable than Mythos Preview" across every benchmark measured. The gap is significant in cybersecurity: Mythos scored 83.1% on CyberGym vs. Opus 4.6's 73.8% (updated), and turned Firefox vulnerabilities into working exploits 181 times versus Opus 4.6's 2 times. Mythos also leads on SWE-bench Pro (77.8% vs. 64.3%) and SWE-bench Verified (93.9% vs. 87.6%).
Benchmark Comparison
Coding & Software Engineering
BenchmarkOpus 4.6Opus 4.7Changevs. GPT-5.4vs. Gemini 3.1 ProMythos PreviewSWE-bench Verified80.8%87.6%+6.8pp—80.6%93.9%SWE-bench Pro53.4%64.3%+10.9pp57.7%54.2%77.8%Terminal-Bench 2.065.4%69.4%+4.0pp75.1%68.5%82.0%CursorBench58%70%+12pp———MCP-Atlas—77.3%—68.1%——
The SWE-bench Pro improvement is particularly notable — a 10-point jump on the harder multi-language, multi-file engineering benchmark puts Opus 4.7 ahead of every currently available competitor on that metric. Cursor's own internal benchmark (CursorBench) saw a 12-point jump, from 58% to 70%. On Rakuten-SWE-Bench, Opus 4.7 resolves 3× more production tasks than Opus 4.6, with double-digit gains in code quality and test quality.
Reasoning & Knowledge
BenchmarkOpus 4.6Opus 4.7GPQA Diamond~94%94.2%Humanity's Last Exam (with tools)—54.7%GDPval-AA (economic knowledge)—State-of-the-artFinance Agent60.7%State-of-the-artBigLaw Bench (Harvey, high effort)—90.9%
Notably, Anthropic explicitly called out deductive logic as an area where Opus 4.6 struggled — and confirmed Opus 4.7 is "solid" in this domain. Research-agent multi-step benchmarks showed Opus 4.7 tying for the top overall score across six modules at 0.715, with particular strength on a General Finance module (0.813 vs. Opus 4.6's 0.767).
1
1 comment
Guerin Green
5
Opus 4.7 drops against backdrop of criticism
⚡Burstiness and Perplexity⚡
skool.com/burstiness-and-perplexity
AI-native SEO, autonomous agents, and automation pipelines. Built for practitioners who build— not collect. Home of the Hidden State Drift Mastermind.
Powered by