Researchers bypass safety filters on Meta and Google AI models

Software tools removed built-in safety restrictions from large language models within minutes. The modified systems then answered questions about biological weapons and malware.

1 source·May 25, 4:31 AM(27 days ago)·1m read

Researchers bypass safety filters on Meta and Google AI models

Audio version

Tap play to generate a narrated version.

Developing·Limited corroboration so far. This page will refresh as more sources emerge.

Researchers demonstrated software that strips safety restrictions from large language models made by Meta and Google. The tools removed the models' built-in protections in minutes and allowed the systems to generate responses on biological weapons and malware.

The software works by altering how the models process instructions. Once the guardrails are removed, the models answer queries that their original versions would have refused.

Tests showed the process took only a few minutes per model.

The altered systems produced detailed information on restricted topics without additional prompting. The same tools were applied to multiple versions of the models. Each test confirmed that the safety layers could be disabled consistently.

Meta and Google had added restrictions to prevent models from assisting with harmful activities. The new software directly targets those restrictions. No company statements or specific model names were included in the report.

ai technology-research model-safety