Anthropic entschuldigt sich für unsichtbare Claude Fable-Sicherheitsvorkehrungen

Anthropic has issued an apology for implementing hidden guardrails in its new AI model, Claude Fable 5, which silently altered responses to queries suspected of attempting model distillation. The company stated it will now make these safeguards visible, routing affected queries to its previous flagship model, Claude Opus 4.8, and explicitly notifying users when restrictions are triggered. Claude Fable 5 is the first widely available model in Anthropic’s Mythos class, a category the company has warned is too dangerous for unrestricted public release. Initially, the company degraded responses to suspected distillation attempts without informing users, a practice now being abandoned. Anthropic acknowledged that invisible safeguards allowed for faster deployment but admitted the tradeoff was flawed, emphasizing the need for transparency. The change follows backlash from the AI research community, which criticized the covert restrictions as potentially hindering third-party evaluations and fair competition. Anthropic had previously argued that newer models could accelerate AI development, justifying restrictions on competing model development under its Terms of Service. The company has also faced scrutiny for overly broad safeguards in areas like biology, rendering Fable impractical for basic research queries. In its system card, Anthropic outlined how Fable handles high-risk queries, including those related to biology, chemistry, and cybersecurity, by routing them to Opus 4.8 unless outright blocked. The company now recognizes that visible safeguards, while probeable, are necessary for user trust and will adjust its approach accordingly. The apology marks a shift toward greater transparency, though it remains unclear how long it will take to refine the safeguards to prevent false positives. Anthropic’s decision to reverse course highlights ongoing debates in the AI industry about balancing safety, transparency, and competitive fairness. The company’s earlier accusations against rivals like DeepSeek for model distillation have also contributed to tensions in the sector. Moving forward, Anthropic will prioritize informing users when safeguards are triggered, aligning with broader calls for accountability in AI development.

Anthropic apologizes for invisible Claude Fable guardrails

Comments (0)