Anthropic Releases Claude Fable 5 and Mythos 5 With Safeguards Limiting Frontier AI Research Tasks
Anthropic introduced two new models on Tuesday that include measures to limit assistance on frontier AI research tasks. The safeguards route some requests to weaker models and can degrade responses without user notification.
Anthropic unveiled Claude Fable 5 and Mythos 5 on Tuesday. The company described the models as its long-awaited Mythos-class AI systems. It published the safeguards in a system card that accompanied the release.
The safeguards are designed to reduce the risk that powerful AI systems help users develop competing frontier models or accelerate dangerous capabilities. The models may secretly provide degraded assistance when they suspect users are working on frontier AI research. Certain requests are automatically routed to less capable models.
Mythos was withheld from public release in April after Anthropic warned that the model was so capable at finding cybersecurity flaws that it posed safety risks. Anthropic marketing said in April that Mythos was too dangerous to be in public hands. Mythos is being sold unchanged to the public.
5 Pro. David Kasten, head of policy at Palisade Research, said he believes Anthropic is genuinely trying to reduce the risks associated with Mythos. "I do very much take Anthropic at their word that they are trying hard to de-risk what they see as the two risky features of Mythos," Kasten told Business Insider.
Davi Ottenheimer, VP of Trust and Digital Ethics at Inrupt, questioned whether Mythos was ever as dangerous as Anthropic suggested. " Roman Stanek, founder of GoodData, said that AI capability isn't the real problem in cybersecurity.
"The thing with AI and cybersecurity is that the vulnerabilities we're worried about AI exploiting, well we've known about most of them for 20 years," Stanek said in a note sent to Business Insider. " He added, "Nobody wanted to pay a human engineer to fix it.


