Best Ben Jaffe and Katie Malone Podcasts (2024)

1
So long, and thanks for all the fish 35:44

4y ago35:44

35:44

All good things must come to an end, including this podcast. This is the last episode we plan to release, and it doesn’t cover data science—it’s mostly reminiscing, thanking our wonderful audience (that’s you!), and marveling at how this thing that started out as a side project grew into a huge part of our lives for over 5 years.It’s been a ride, a…

1
A Reality Check on AI-Driven Medical Assistants 14:00

4y ago14:00

14:00

The data science and artificial intelligence community has made amazing strides in the past few years to algorithmically automate portions of the healthcare process. This episode looks at two computer vision algorithms, one that diagnoses diabetic retinopathy and another that classifies liver cancer, and asks the question—are patients now getting b…

1
A Data Science Take on Open Policing Data 23:44

4y ago23:44

23:44

A few weeks ago, we put out a call for data scientists interested in issues of race and racism, or people studying how those topics can be studied with data science methods, should get in touch to come talk to our audience about their work. This week we’re excited to bring on Todd Hendricks, Bay Area data scientist and a volunteer who reached out t…

1
Procella: YouTube's super-system for analytics data storage 29:48

4y ago29:48

29:48

This is a re-release of an episode that originally ran in October 2019.If you’re trying to manage a project that serves up analytics data for a few very distinct uses, you’d be wise to consider having custom solutions for each use case that are optimized for the needs and constraints of that use cases. You also wouldn’t be YouTube, which found them…

1
The Data Science Open Source Ecosystem 23:06

4y ago23:06

23:06

Open source software is ubiquitous throughout data science, and enables the work of nearly every data scientist in some way or another. Open source projects, however, are disproportionately maintained by a small number of individuals, some of whom are institutionally supported, but many of whom do this maintenance on a purely volunteer basis. The h…

1
Rock the ROC Curve 15:52

4y ago15:52

15:52

This is a re-release of an episode that first ran on January 29, 2017.This week: everybody's favorite WWII-era classifier metric! But it's not just for winning wars, it's a fantastic go-to metric for all your classifier quality needs.By Ben Jaffe and Katie Malone

1
Criminology and Data Science 30:57

4y ago30:57

30:57

This episode features Zach Drake, a working data scientist and PhD candidate in the Criminology, Law and Society program at George Mason University. Zach specializes in bringing data science methods to studies of criminal behavior, and got in touch after our last episode (about racially complicated recidivism algorithms). Our conversation covers a …

1
Racism, the criminal justice system, and data science 31:36

4y ago31:36

31:36

As protests sweep across the United States in the wake of the killing of George Floyd by a Minneapolis police officer, we take a moment to dig into one of the ways that data science perpetuates and amplifies racism in the American criminal justice system. COMPAS is an algorithm that claims to give a prediction about the likelihood of an offender to…

1
An interstitial word from Ben 5:59

4y ago5:59

5:59

A message from Ben around algorithmic bias, and how our models are sometimes reflections of ourselves.By Ben Jaffe and Katie Malone

1
Convolutional Neural Networks 21:55

4y ago21:55

21:55

This is a re-release of an episode that originally aired on April 1, 2018If you've done image recognition or computer vision tasks with a neural network, you've probably used a convolutional neural net. This episode is all about the architecture and implementation details of convolutional networks, and the tricks that make them so good at image tas…

1
Stein's Paradox 27:02

4y ago27:02

27:02

This is a re-release of an episode that was originally released on February 26, 2017.When you're estimating something about some object that's a member of a larger group of similar objects (say, the batting average of a baseball player, who belongs to a baseball team), how should you estimate it: use measurements of the individual, or get some extr…

1
Protecting Individual-Level Census Data with Differential Privacy 21:19

4y ago21:19

21:19

The power of finely-grained, individual-level data comes with a drawback: it compromises the privacy of potentially anyone and everyone in the dataset. Even for de-identified datasets, there can be ways to re-identify the records or otherwise figure out sensitive personal information. That problem has motivated the study of differential privacy, a …

1
Causal Trees 15:27

4y ago15:27

15:27

What do you get when you combine the causal inference needs of econometrics with the data-driven methodology of machine learning? Usually these two don’t go well together (deriving causal conclusions from naive data methods leads to biased answers) but economists Susan Athey and Guido Imbens are on the case. This episodes explores their algorithm f…

1
The Grammar Of Graphics 35:38

4y ago35:38

35:38

You may not realize it consciously, but beautiful visualizations have rules. The rules are often implict and manifest themselves as expectations about how the data is summarized, presented, and annotated so you can quickly extract the information in the underlying data using just visual cues. It’s a bit abstract but very profound, and these princip…

1
Gaussian Processes 20:55

4y ago20:55

20:55

It’s pretty common to fit a function to a dataset when you’re a data scientist. But in many cases, it’s not clear what kind of function might be most appropriate—linear? quadratic? sinusoidal? some combination of these, and perhaps others? Gaussian processes introduce a nonparameteric option where you can fit over all the possible types of function…

1
Keeping ourselves honest when we work with observational healthcare data 19:08

4y ago19:08

19:08

The abundance of data in healthcare, and the value we could capture from structuring and analyzing that data, is a huge opportunity. It also presents huge challenges. One of the biggest challenges is how, exactly, to do that structuring and analysis—data scientists working with this data have hundreds or thousands of small, and sometimes large, dec…

1
Changing our formulation of AI to avoid runaway risks: Interview with Prof. Stuart Russell 28:58

4y ago28:58

28:58

AI is evolving incredibly quickly, and thinking now about where it might go next (and how we as a species and a society should be prepared) is critical. Professor Stuart Russell, an AI expert at UC Berkeley, has a formulation for modifications to AI that we should study and try implementing now to keep it much safer in the long run. Prof. Russell’s…

1
Putting machine learning into a database 24:22

4y ago24:22

24:22

Most data scientists bounce back and forth regularly between doing analysis in databases using SQL and building and deploying machine learning pipelines in R or python. But if we think ahead a few years, a few visionary researchers are starting to see a world in which the ML pipelines can actually be deployed inside the database. Why? One strong ad…

1
The work-from-home episode 29:06

4y ago29:06

29:06

Many of us have the privilege of working from home right now, in an effort to keep ourselves and our family safe and slow the transmission of covid-19. But working from home is an adjustment for many of us, and can hold some challenges compared to coming in to the office every day. This episode explores this a little bit, informally, as we compare …

1
Understanding Covid-19 transmission: what the data suggests about how the disease spreads 25:25

4y ago25:25

25:25

Covid-19 is turning the world upside down right now. One thing that’s extremely important to understand, in order to fight it as effectively as possible, is how the virus spreads and especially how much of the spread of the disease comes from carriers who are experiencing no or mild symptoms but are contagious anyway. This episode digs into the epi…

1
Network effects re-release: when the power of a public health measure lies in widespread adoption 26:40

4y ago26:40

26:40

This week’s episode is a re-release of a recent episode, which we don’t usually do but it seems important for understanding what we can all do to slow the spread of covid-19. In brief, public health measures for infectious diseases get most of their effectiveness from their widespread adoption: most of the protection you get from a vaccine, for exa…

1
Causal inference when you can't experiment: difference-in-differences and synthetic controls 20:48

4y ago20:48

20:48

When you need to untangle cause and effect, but you can’t run an experiment, it’s time to get creative. This episode covers difference in differences and synthetic controls, two observational causal inference techniques that researchers have used to understand causality in complex real-world situations.…

1
Better know a distribution: the Poisson distribution 31:51

4y ago31:51

31:51

This is a re-release of an episode that originally ran on October 21, 2018.The Poisson distribution is a probability distribution function used to for events that happen in time or space. It’s super handy because it’s pretty simple to use and is applicable for tons of things—there are a lot of interesting processes that boil down to “events that ha…

1
The Lottery Ticket Hypothesis 19:45

4y ago19:45

19:45

Recent research into neural networks reveals that sometimes, not all parts of the neural net are equally responsible for the performance of the network overall. Instead, it seems like (in some neural nets, at least) there are smaller subnetworks present where most of the predictive power resides. The fascinating thing is that, for some of these sub…

1
Interesting technical issues prompted by GDPR and data privacy concerns 20:26

4y ago20:26

20:26

Data privacy is a huge issue right now, after years of consumers and users gaining awareness of just how much of their personal data is out there and how companies are using it. Policies like GDPR are imposing more stringent rules on who can use what data for what purposes, with an end goal of giving consumers more control and privacy around their …

Podcasts Worth a Listen

Ben Jaffe And Katie Malone Podcasts

Podcasts Worth a Listen

1
Linear Digressions

Ben Jaffe and Katie Malone

1
So long, and thanks for all the fish 35:44

1
A Reality Check on AI-Driven Medical Assistants 14:00

1
A Data Science Take on Open Policing Data 23:44

1
Procella: YouTube's super-system for analytics data storage 29:48

1
The Data Science Open Source Ecosystem 23:06

1
Rock the ROC Curve 15:52

1
Criminology and Data Science 30:57

1
Racism, the criminal justice system, and data science 31:36

1
An interstitial word from Ben 5:59

1
Convolutional Neural Networks 21:55

1
Stein's Paradox 27:02

1
Protecting Individual-Level Census Data with Differential Privacy 21:19

1
Causal Trees 15:27

1
The Grammar Of Graphics 35:38

1
Gaussian Processes 20:55

1
Keeping ourselves honest when we work with observational healthcare data 19:08

1
Changing our formulation of AI to avoid runaway risks: Interview with Prof. Stuart Russell 28:58

1
Putting machine learning into a database 24:22

1
The work-from-home episode 29:06

1
Understanding Covid-19 transmission: what the data suggests about how the disease spreads 25:25

1
Network effects re-release: when the power of a public health measure lies in widespread adoption 26:40

1
Causal inference when you can't experiment: difference-in-differences and synthetic controls 20:48

1
Better know a distribution: the Poisson distribution 31:51

1
The Lottery Ticket Hypothesis 19:45

1
Interesting technical issues prompted by GDPR and data privacy concerns 20:26

Quick Reference Guide