jailbreaks

2 Articles
New security system drastically reduces chatbot jailbreaks
Tech

New security system drastically reduces chatbot jailbreaks

Constitutional Classifiers. (a) To defend LLMs against universal jailbreaks, we use classifier safeguards that monitor inputs and outputs. (b) To train these safeguards,...

Anthropic has a new security system it says can stop almost all AI jailbreaks
Tech

Anthropic has a new security system it says can stop almost all AI jailbreaks

Anthropic unveils new proof-of-concept security measure tested on Claude 3.5 Sonnet “Constitutional classifiers” are an attempt to teach LLMs value systems Tests resulted...