📰 AI News: Anthropic Reveals How Autonomous AI Agents Really Are In The Wild
📝 TL;DR Anthropic just published a real world audit of AI agents and the results are surprisingly grounded. Most agent actions still have a human in the loop, most actions are reversible, but the frontier of higher risk and higher autonomy is already showing up. 🧠 Overview Anthropic analyzed millions of agent tool use events across two sources, Claude Code sessions and tool calls made through its public API. Instead of guessing how autonomous agents are, they measured what people actually let agents do, how often humans step in, and how risky those actions look in practice. The big takeaway is that autonomy is not just a model feature, it is co created by the model, the user, and the product design. 📜 The Announcement Anthropic introduced a practical way to score “risk” and “autonomy” for agent actions on a 1 to 10 scale, then clustered similar tool calls to see where real usage sits today. They also examined Claude Code behavior in long interactive sessions to understand when agents stop and ask for help, and when humans interrupt. Headline numbers from public API tool use suggest most activity is low risk and supervised, but there is still meaningful experimentation at the edge, especially in security, finance, and medical data workflows. ⚙️ How It Works • Risk and autonomy scoring - Each tool call is scored from 1 to 10 for risk and autonomy, then grouped into clusters to see which types of actions sit at the extremes. • Most actions are supervised - About 73% of tool calls appear to have a human in the loop in some way, and about 80% appear to have at least one safeguard like restricted permissions or approval requirements. • Irreversible actions are rare - Only about 0.8% of actions look irreversible, such as sending an email to a customer, most actions can be undone or corrected. • Claude Code self limits autonomy - In Claude Code, the agent increasingly pauses to consult the user as tasks get harder, suggesting the model is trained to ask clarifying questions when uncertain.