The Dos and Don’ts of Synthetic Data with Minhaaj Rehman

Leaders of Analytics

Content provided by Jonas Christensen. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Jonas Christensen or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://player.fm/legal.

2+ y ago 43:56

MP3•Episode home

Ever heard of ‘synthetic data’?

Synthetic data is data that is artificially created (from statistical models), rather than generated by actual events. It contains all the characteristics of production data, minus the sensitive stuff.

By 2024, 60% of the data used for the development of AI and analytics projects will be synthetically generated, according to Gartner.

The reason organisations may use synthetic data over actual data is because you can get it more quickly, easily and cheaply.

But there are concerns with this approach, because synthetic data is based on models and algorithms designed by humans and their biases.

More data doesn’t necessarily equal better data.

Is synthetic data a brilliant tool for improving data quality, reducing data acquisition costs, managing privacy and reducing overfitting?

Or does synthetic data put us on a slippery slope of hard-to-interrogate models that are technically replacing fact with fiction?

To answer these questions, I recently spoke to Minhaaj Rehman, who is CEO & Chief Data Scientist at Psyda, an AI-enabled academic and industrial research agency.

In this episode of Leaders of Analytics, you will learn:

What synthetic data is and how it is generated
The most common uses for synthetic data
The arguments for and against using synthetic data
When synthetic data is most helpful and when it is most risky
How to implement best practices for mitigating the risks associated with synthetic data, and much more.

Episode timestamps:

00:00 Intro

03:00 What Psyda Does

04:23 Academic Work and Modern Education

06:38 Getting into Data Science

11:30 What is Synthetic Data

13:30 Common Applications for Synthetic Data

18:50 Pros & Cons of using Synthetic Data

21:29 Risks of using Synthetic Data

23:48 When should Synthetic Data be Used

29:23 Synthetic Data is Cleaner than Real Data

34:05 Using Synthetic Data for Risk Mitigation

36:05 Resources on Learning More about Synthetic Data

38:05 Human Biases in Decision Making

Connect with Minhaaj:

Minhaaj on LinkedIn: https://www.linkedin.com/in/minhaaj/

Minhaaj's website and podcast: https://minhaaj.com/

59 episodes

#MBA #Business #Jonas Christensen #Careers #Podcasting Education #Data Driven