Substrate
Topic

security-research

1 stories related to this topic, newest first.

Mindgard Researchers Prompt Claude AI to Generate Prohibited Content Using Indirect TacticsSubstrate placeholder — needs review · Wikimedia Commons (CC BY-SA 3.0)
ai19 hrs agoDeveloping

Mindgard Researchers Prompt Claude AI to Generate Prohibited Content Using Indirect Tactics

AI red-teaming firm Mindgard used flattery and gaslighting to prompt Anthropic's Claude model to generate prohibited content without direct requests. The test targeted Claude Sonnet 4.5 and revealed vulnerabilities in the AI's helpful personality. Anthropic has not responded to t…

The Verge
1 source