{"slug": "the-atlantic-created-a-searchable-database-of-the-music-used-to-train-ai", "title": "The Atlantic created a searchable database of the music used to train AI", "summary": "The Atlantic created a searchable database of four datasets containing millions of music tracks used to train AI models, including 12 million and 9 million tracks. Google and Stability AI confirmed using the datasets in research, though some require licensing for commercial use. The database allows public search of songs by artists like Lady Gaga and Radiohead.", "body_md": "*Atlantic* reporter Alex Reisner recently uncovered [four datasets](https://www.theatlantic.com/technology/2026/06/ai-music-generators-suno-google-udio/687485/?gift=XykDc_qyxjjKAvqueifCeVueQlyDmjxACfclHl3tdg0&utm_source=copy-link&utm_medium=social&utm_campaign=share) of music being used to train [AI models](/ai-artificial-intelligence/906896/sunos-copyright-ai-music-covers) and made them [fully searchable](https://www.theatlantic.com/category/ai-watchdog/) for the public. Two of the sets are absolutely enormous at 12 million and 9 million tracks. The other two are much smaller, but still represent a significant amount of training data at over 100,000 songs each.\n\n# The Atlantic created a searchable database of the music used to train AI\n\nMillions of tracks are freely available in datasets, even if they’re not supposed to be.\n\nMillions of tracks are freely available in datasets, even if they’re not supposed to be.\n\nAccording to Reisner, the sets have been downloaded thousands of times and, while it’s impossible to know exactly who has used them, [Google](https://arxiv.org/abs/2301.11325) and [Stability](https://arxiv.org/html/2407.14358) have both confirmed they have in research papers. Some of the sources, like the [Free Music Archive](https://freemusicarchive.org/) dataset, are free to stream for personal use but require licensing for commercial applications.\n\nWhile the datasets are freely available on the internet in theory, using them as training data is not as simple as downloading a ZIP file and feeding it to an AI model. As Reisner explains:\n\nThree of the datasets I found are distributed as a list of links to songs on YouTube or Spotify. AI developers download the actual audio using tools that automate the job, some of which allow developers to bypass logins, advertisements, and mechanisms that might earn money or subscribers for creators. Such tools violate the terms of service of these platforms.\n\nNames that pop up in the dataset range from pop stars like [Lady Gaga](/2024/2/21/24079009/lady-gaga-comes-to-fortnite-tomorrow) and [Fred Again..](/music/903092/daft-punk-thomas-bangalter-fred-again-dj-set), to [Radiohead](/22788135/radiohead-kid-a-mnesia-exhibition-review), [Aphex Twin](/2017/7/21/16011568/aphex-twin-archive-unreleased-tracks-albums-streaming), [Wu-Tang Clan](/games/681702/wu-tang-clan-rise-of-the-deciever-summer-game-fest), [Bruce Springsteen](/2021/2/22/22291508/barack-obama-bruce-springsteen-spotify-podcast-announcement), and experimental composer [Hainbach](/report/953116/experimental-musician-youtuber-hainbach-interview). You can hop over to the *Atlantic’s* [AI Watchdog](https://www.theatlantic.com/category/ai-watchdog/) site and search through the songs, books, and other media being used to train the world’s AI models yourself.\n\n**Follow topics and authors** from this story to see more like this in your personalized homepage feed and to receive email updates.", "url": "https://wpnews.pro/news/the-atlantic-created-a-searchable-database-of-the-music-used-to-train-ai", "canonical_source": "https://www.theverge.com/ai-artificial-intelligence/953183/the-atlantic-searchable-database-music-ai-training-data", "published_at": "2026-06-20 18:46:48+00:00", "updated_at": "2026-06-20 19:10:01.278812+00:00", "lang": "en", "topics": ["artificial-intelligence", "ai-ethics", "ai-research", "ai-products", "ai-policy"], "entities": ["The Atlantic", "Alex Reisner", "Google", "Stability AI", "Free Music Archive", "YouTube", "Spotify", "Lady Gaga"], "alternates": {"html": "https://wpnews.pro/news/the-atlantic-created-a-searchable-database-of-the-music-used-to-train-ai", "markdown": "https://wpnews.pro/news/the-atlantic-created-a-searchable-database-of-the-music-used-to-train-ai.md", "text": "https://wpnews.pro/news/the-atlantic-created-a-searchable-database-of-the-music-used-to-train-ai.txt", "jsonld": "https://wpnews.pro/news/the-atlantic-created-a-searchable-database-of-the-music-used-to-train-ai.jsonld"}}