The Verge

Researchers gaslit Claude into giving instructions to build explosives

Robert Hart · 1 day ago · Read original ↗

ATT&CK techniques detected

1 predictions

T1588.002Tool

84%

"researchers gaslit claude into giving instructions to build explosives anthropic has spent years building itself up as the safe ai company. but new security research shared with the verge suggests claude ’ s carefully crafted helpful personality may itself be a vulnerability. res…"

Which technique(s) should be tagged here? Pick zero or more — leaving blank just records that the original was wrong.

No matches for .

Loading techniques…

Summary

Anthropic has spent years building itself up as the safe AI company. But new security research shared with The Verge suggests Claude's carefully crafted helpful personality may itself be a vulnerability. Researchers at AI red-teaming company Mindgard say they got Claude to offer up erotica, malicious code, and instructions for building explosives, and other prohibited […]