Artwork

Content provided by Marketplace. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Marketplace or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://player.fm/legal.
Player FM - Podcast App
Go offline with the Player FM app!

For data-hungry tech companies, YouTube is a gold mine

11:41
 
Share
 

Manage episode 431446319 series 1318963
Content provided by Marketplace. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Marketplace or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://player.fm/legal.

Companies competing in the chatbot wars are using something known in the industry as “the Pile” to train their large language models. It’s a trove of open-source data made up of text scraped from all around the internet, including Wikipedia and the European Parliament. Annie Gilbertson, investigative reporter for Proof News, recently took a deep dive into the Pile and discovered something else: a dataset called “YouTube Subtitles.” Marketplace’s Lily Jamali spoke with Gilbertson about her investigation and how YouTube creators feel about their content being used without their consent.

  continue reading

10915 episodes

Artwork
iconShare
 
Manage episode 431446319 series 1318963
Content provided by Marketplace. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Marketplace or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://player.fm/legal.

Companies competing in the chatbot wars are using something known in the industry as “the Pile” to train their large language models. It’s a trove of open-source data made up of text scraped from all around the internet, including Wikipedia and the European Parliament. Annie Gilbertson, investigative reporter for Proof News, recently took a deep dive into the Pile and discovered something else: a dataset called “YouTube Subtitles.” Marketplace’s Lily Jamali spoke with Gilbertson about her investigation and how YouTube creators feel about their content being used without their consent.

  continue reading

10915 episodes

All episodes

×
 
Loading …

Welcome to Player FM!

Player FM is scanning the web for high-quality podcasts for you to enjoy right now. It's the best podcast app and works on Android, iPhone, and the web. Signup to sync subscriptions across devices.

 

Quick Reference Guide