Tech

Private API keys and passwords found in AI training dataset – nearly 12,000 details leaked

Share
Share


  • Truffle Security found thousands of pieces of private info in Common Crawl
  • The archives are used to train some of the biggest LLMs today
  • The researchers notified the vendors and helped fix the problem

Cybersecurity researchers have found thousands of login credentials and other secrets in the Common Crawl dataset.

Common Crawl is a nonprofit organization that provides a freely accessible archive of web data, collected through large-scale web crawling. As of recent estimates, the organization hosts over 250 petabytes of web data, with monthly crawls adding several petabytes more.

Share

Leave a comment

Leave a Reply

Your email address will not be published. Required fields are marked *

Related Articles
Amazon is planning one of its biggest cloud investments yet as it goes big down under
Tech

Amazon is planning one of its biggest cloud investments yet as it goes big down under

Amazon to invest AU$20 billion in Australia between now and 2029 New...

Forget Ray-ban – Meta’s next smart glasses just got a surprise launch date and an exciting new partner
Tech

Forget Ray-ban – Meta’s next smart glasses just got a surprise launch date and an exciting new partner

Meta has just announced it’s partnering with Oakley on something new Most...

Here’s why you should be excited about Audio Overviews coming to Google Search
Tech

Here’s why you should be excited about Audio Overviews coming to Google Search

Google is testing the NotebookLM feature Audio Overviews in Search The feature...