Fable proves Anthropic wants control, not safety
So everybody is pumping out these cool websites and video games that Fable can make in one prompt (no shade). If you guys look through any little bit of the system card for this thing, it's fucking terrifying. THOUGHTS!? https://www-cdn.anthropic.com/d00db56fa754a1b115b6dd7cb2e3c342ee809620.pdf 1. Silent degradation for AI/ML work This is the biggest one. Anthropic says Fable 5 has special safeguards for frontier LLM development. But unlike cyber/bio/chem safeguards, these are not visible to the user. “Unlike our interventions for cybersecurity, biology and chemistry, and distillation attempts, these safeguards will not be visible to the user.” They say Fable 5 will not fall back to another model. Instead, they may reduce effectiveness through: - prompt modification - steering vectors - PEFT/fine-tuning interventions The user still gets an answer, but it may be an altered / steered answer without disclosure. You don’t know if Claude is failing because the task is hard, because your prompt sucks, or because Anthropic quietly nerfed the model. 2. Anti-competitive access split: Mythos for trusted partners, Fable for everyone else Anthropic frames this as responsible deployment: - Mythos 5 = stronger model, trusted partners only. - Fable 5 = public model, safeguards added. The system card says Mythos is available only to a small set of vetted/trusted partners, starting with Project Glasswing. So we've established precedent for intelligence class divide. The most capable model goes to governments, big labs, banks, and major tech companies. Everyone else gets the restricted. 3. “Safety” classifiers can become capability control For cyber, bio, chemistry, and distillation, Fable may route the user to a weaker model or block the request. Anthropic presents this as necessary risk reduction. Fair enough for bioweapons/cybercrime. But the category list includes distillation attempts and frontier AI