Tech

Anthropic has a new security system it says can stop almost all AI jailbreaks

Share
Share


  • Anthropic unveils new proof-of-concept security measure tested on Claude 3.5 Sonnet
  • “Constitutional classifiers” are an attempt to teach LLMs value systems
  • Tests resulted in more than an 80% reduction in successful jailbreaks

In a bid to tackle abusive natural language prompts in AI tools, OpenAI rival Anthropic has unveiled a new concept it calls “constitutional classifiers”; a means of instilling a set of human-like values (literally, a constitution) into a large language model.

Anthropic’s Safeguards Research Team unveiled the new security measure, designed to curb jailbreaks (or achieving output that goes outside of an LLM’s established safeguards) of Claude 3.5 Sonnet, its latest and greatest large language model, in a new academic paper.

Share

Leave a comment

Leave a Reply

Your email address will not be published. Required fields are marked *

Related Articles
Samsung Galaxy Z Flip 7 rumored specs: predictions for every key spec
Tech

Samsung Galaxy Z Flip 7 rumored specs: predictions for every key spec

The Samsung Galaxy Z Flip 7 might not be a comprehensive upgrade...

Agatha Christie’s AI ghost is here to teach you how to kill…at writing mystery stories
Tech

Agatha Christie’s AI ghost is here to teach you how to kill…at writing mystery stories

BBC Maestro has launched a writing course taught posthumously by an AI...

Online shopping is now a bot fest — real users just lost the internet to AI-powered fake shoppers
Tech

Online shopping is now a bot fest — real users just lost the internet to AI-powered fake shoppers

Report warns sophisticated bots mimic human behavior so well outdated defenses don’t...

Is the UK’s energy storage growing fast enough?
Tech

Is the UK’s energy storage growing fast enough?

Credit: Pixabay/CC0 Public Domain Britain’s booming green energy generation has a costly...