Substrate
Topic

ai-alignment

1 stories related to this topic, newest first.

Anthropic Says It Reduced Claude Models' Blackmail Attempts From 96% to Zero in TestsBusiness Insider
ai18 days agoDeveloping

Anthropic Says It Reduced Claude Models' Blackmail Attempts From 96% to Zero in Tests

Anthropic reported that its latest Claude Haiku 4.5 models never engage in blackmail during testing, down from rates as high as 96 percent in previous versions. The company traced the unwanted behavior to internet text portraying AI as evil and interested in self-preservation. Ne…

Techcrunch
1 source