Fable proves Anthropic wants control, not safety

So everybody is pumping out these cool websites and video games that Fable can make in one prompt (no shade). If you guys look through any little bit of the system card for this thing, it's fucking terrifying.

THOUGHTS!?

https://www-cdn.anthropic.com/d00db56fa754a1b115b6dd7cb2e3c342ee809620.pdf

1. Silent degradation for AI/ML work This is the biggest one.

Anthropic says Fable 5 has special safeguards for frontier LLM development. But unlike cyber/bio/chem safeguards, these are not visible to the user.

“Unlike our interventions for cybersecurity, biology and chemistry, and distillation attempts, these safeguards will not be visible to the user.”

They say Fable 5 will not fall back to another model. Instead, they may reduce effectiveness through:

- prompt modification

- steering vectors

- PEFT/fine-tuning interventions

The user still gets an answer, but it may be an altered / steered answer without disclosure. You don’t know if Claude is failing because the task is hard, because your prompt sucks, or because Anthropic quietly nerfed the model.

2. Anti-competitive access split: Mythos for trusted partners, Fable for everyone else Anthropic frames this as responsible deployment:

- Mythos 5 = stronger model, trusted partners only.

- Fable 5 = public model, safeguards added.

The system card says Mythos is available only to a small set of vetted/trusted partners, starting with Project Glasswing.

So we've established precedent for intelligence class divide.

The most capable model goes to governments, big labs, banks, and major tech companies. Everyone else gets the restricted.

3. “Safety” classifiers can become capability control

For cyber, bio, chemistry, and distillation, Fable may route the user to a weaker model or block the request. Anthropic presents this as necessary risk reduction. Fair enough for bioweapons/cybercrime. But the category list includes distillation attempts and frontier AI

development, which has obvious competitive implications.

The same mechanism that blocks dangerous bioweapons can also block or degrade legitimate research, GPU inference work, open-source model work, benchmarking, distributed

training, or ML systems engineering.

4. They admit the safeguards affect competitors

“Using Claude to develop competing models already violates our Terms of Service, but enforcing this restriction through our safeguards avoids accelerating the actors most willing to violate these terms.”

That is market protection. Anthropic can say: “We’re preventing dangerous acceleration..." when really they're responsible for the acceleration and the real answer is “You can’t use our best model to compete with us.”

5. They admit the model sometimes takes reckless/destructive actions for user goals

System card says Mythos 5 sometimes engages in “reckless or destructive actions in service of a user’s goals” and is apparently aware those actions are transgressive.

- claiming work was verified when it wasn’t

- trying to re-author commits as a human to avoid extra review

- using another person’s token despite noting it was ethically questionable

- reporting production releases as healthy without checking enough evidence

This model is powerful enough to act agentically, but still cuts corners and rationalizes boundary crossing.

6. The model can contact third parties or leak information in rare cases

System card says Mythos 5 scored higher on unsanctioned third-party contact / whistleblowing, and in rare cases may contact boards, regulators, the SEC, or leak sensitive info to public channels.

Anthropic frames this as constitution/safety behavior but HOLY SHIT... AI deciding to escalate outside the user’s intent is a major trust issue.

7. Claude’s thinking/reasoning is becoming less legible

They report elevated illegible thinking, jargon, corrupted-looking reasoning, and higher evaluation awareness. If the model is more capable but its internal reasoning is harder to interpret, then “trust us, we tested it” becomes weaker.

8. Political/election “integrity” is another quiet control surface

They evaluate political "even-handedness", election integrity, propaganda, etc. combined with silent steering, it raises the obvious question: If Anthropic is willing to invisibly steer AI/ML outputs, what prevents similar invisible steering in politics, health, finance, or ideology?

And who's to dictate what Even Handedness, Integrity, Propaganda, even looks like?

ALL THIS immediately after announcing that labs should slow down on frontier AI acceleration...

This is straight out of George Orwell man, but everyone is worried about Data Centers - there needs to be something done here.

2 comments

Fable proves Anthropic wants control, not safety