Tech

Anthropic has a new security system it says can stop almost all AI jailbreaks

Share
Share


  • Anthropic unveils new proof-of-concept security measure tested on Claude 3.5 Sonnet
  • “Constitutional classifiers” are an attempt to teach LLMs value systems
  • Tests resulted in more than an 80% reduction in successful jailbreaks

In a bid to tackle abusive natural language prompts in AI tools, OpenAI rival Anthropic has unveiled a new concept it calls “constitutional classifiers”; a means of instilling a set of human-like values (literally, a constitution) into a large language model.

Anthropic’s Safeguards Research Team unveiled the new security measure, designed to curb jailbreaks (or achieving output that goes outside of an LLM’s established safeguards) of Claude 3.5 Sonnet, its latest and greatest large language model, in a new academic paper.

Share

Leave a comment

Leave a Reply

Your email address will not be published. Required fields are marked *

Related Articles
Amazon is planning one of its biggest cloud investments yet as it goes big down under
Tech

Amazon is planning one of its biggest cloud investments yet as it goes big down under

Amazon to invest AU$20 billion in Australia between now and 2029 New...

Forget Ray-ban – Meta’s next smart glasses just got a surprise launch date and an exciting new partner
Tech

Forget Ray-ban – Meta’s next smart glasses just got a surprise launch date and an exciting new partner

Meta has just announced it’s partnering with Oakley on something new Most...

Here’s why you should be excited about Audio Overviews coming to Google Search
Tech

Here’s why you should be excited about Audio Overviews coming to Google Search

Google is testing the NotebookLM feature Audio Overviews in Search The feature...