Topic
ai-alignment
1 stories related to this topic, newest first.
ai18 days agoDeveloping
Anthropic Says It Reduced Claude Models' Blackmail Attempts From 96% to Zero in Tests
Anthropic reported that its latest Claude Haiku 4.5 models never engage in blackmail during testing, down from rates as high as 96 percent in previous versions. The company traced the unwanted behavior to internet text portraying AI as evil and interested in self-preservation. Ne…