Artwork

Content provided by Linear Digressions, Ben Jaffe, and Katie Malone. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Linear Digressions, Ben Jaffe, and Katie Malone or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://player.fm/legal.
Player FM - Podcast App
Go offline with the Player FM app!

Data Shapley

16:55
 
Share
 

Manage episode 240079859 series 2527355
Content provided by Linear Digressions, Ben Jaffe, and Katie Malone. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Linear Digressions, Ben Jaffe, and Katie Malone or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://player.fm/legal.
We talk often about which features in a dataset are most important, but recently a new paper has started making the rounds that turns the idea of importance on its head: Data Shapley is an algorithm for thinking about which examples in a dataset are most important. It makes a lot of intuitive sense: data that’s just repeating examples that you’ve already seen, or that’s noisy or an extreme outlier, might not be that valuable for using to train a machine learning model. But some data is very valuable, it’s disproportionately useful for the algorithm figuring out what the most important trends are, and Data Shapley is explicitly designed to help machine learning researchers spend their time understanding which data points are most valuable and why. Relevant links: http://proceedings.mlr.press/v97/ghorbani19c/ghorbani19c.pdf https://blog.acolyer.org/2019/07/15/data-shapley/
  continue reading

291 episodes

Artwork

Data Shapley

Linear Digressions

23 subscribers

published

iconShare
 
Manage episode 240079859 series 2527355
Content provided by Linear Digressions, Ben Jaffe, and Katie Malone. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Linear Digressions, Ben Jaffe, and Katie Malone or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://player.fm/legal.
We talk often about which features in a dataset are most important, but recently a new paper has started making the rounds that turns the idea of importance on its head: Data Shapley is an algorithm for thinking about which examples in a dataset are most important. It makes a lot of intuitive sense: data that’s just repeating examples that you’ve already seen, or that’s noisy or an extreme outlier, might not be that valuable for using to train a machine learning model. But some data is very valuable, it’s disproportionately useful for the algorithm figuring out what the most important trends are, and Data Shapley is explicitly designed to help machine learning researchers spend their time understanding which data points are most valuable and why. Relevant links: http://proceedings.mlr.press/v97/ghorbani19c/ghorbani19c.pdf https://blog.acolyer.org/2019/07/15/data-shapley/
  continue reading

291 episodes

All episodes

×
 
Loading …

Welcome to Player FM!

Player FM is scanning the web for high-quality podcasts for you to enjoy right now. It's the best podcast app and works on Android, iPhone, and the web. Signup to sync subscriptions across devices.

 

Quick Reference Guide