Anthropic Was So Concerned About Its New Mythos-Based Model’s Power That It Lobotomized Its Ability to Improve Itself

Anthropic, led by Dario Amodei, unveiled Fable 5, a new AI model powered by its Mythos architecture, on Tuesday. The company framed the model as 'safe for general use' but implemented safeguards designed to prevent self-improvement, particularly in frontier LLM development. These measures blocked requests related to cybersecurity, biology, chemistry, and AI distillation, frustrating researchers who argued the restrictions were excessive and opaque. Researchers accused Anthropic of 'lobotomizing' Fable 5’s capabilities, with SemiAnalysis tweeting that the model would reject ML research and degrade performance subtly. Others claimed the company was 'shadowbanning' AI developers by silently downgrading responses without user awareness. The controversy escalated after Anthropic’s system card revealed these interventions would not be visible to users, raising concerns about covert sabotage of competitive AI efforts. Under pressure, Anthropic reversed course, announcing it would make the safeguards transparent. In a statement to Wired, the company admitted it had 'made the wrong trade-off' and apologized for failing to balance safety with research accessibility. AI startup Prime Intellect’s Will Brown criticized the approach, stating it implied Anthropic believed only its team should conduct AI research. The debate follows Anthropic’s earlier warnings about 'recursive self-improvement,' where AI systems rapidly enhance themselves beyond human control. The company has previously called for a global AI advancement freeze, citing risks of losing oversight. Fable 5’s release highlights tensions between safety precautions and the open development needed to advance AI technology responsibly.

Anthropic Was So Concerned About Its New Mythos-Based Model’s Power That It Lobotomized Its Ability to Improve Itself

Comments (0)