Tech

Private API keys and passwords found in AI training dataset – nearly 12,000 details leaked

Share
Share


  • Truffle Security found thousands of pieces of private info in Common Crawl
  • The archives are used to train some of the biggest LLMs today
  • The researchers notified the vendors and helped fix the problem

Cybersecurity researchers have found thousands of login credentials and other secrets in the Common Crawl dataset.

Common Crawl is a nonprofit organization that provides a freely accessible archive of web data, collected through large-scale web crawling. As of recent estimates, the organization hosts over 250 petabytes of web data, with monthly crawls adding several petabytes more.

Share

Leave a comment

Leave a Reply

Your email address will not be published. Required fields are marked *

Related Articles
Hydrogen sourcing could make or break Romania’s green steel ambitions
Tech

Hydrogen sourcing could make or break Romania’s green steel ambitions

Credit: Pixabay/CC0 Public Domain A study from the Stockholm School of Economics...

Wimbledon 2025 is set to be the smartest Championships yet, and it might help me fall in love with tennis again
Tech

Wimbledon 2025 is set to be the smartest Championships yet, and it might help me fall in love with tennis again

This year’s Wimbledon tennis championships is set to be the most interactive...