Artwork

Content provided by Jonas Christensen. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Jonas Christensen or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://player.fm/legal.
Player FM - Podcast App
Go offline with the Player FM app!

The Dos and Don’ts of Synthetic Data with Minhaaj Rehman

43:56
 
Share
 

Manage episode 374320790 series 2951995
Content provided by Jonas Christensen. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Jonas Christensen or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://player.fm/legal.

Ever heard of ‘synthetic data’?

Synthetic data is data that is artificially created (from statistical models), rather than generated by actual events. It contains all the characteristics of production data, minus the sensitive stuff.

By 2024, 60% of the data used for the development of AI and analytics projects will be synthetically generated, according to Gartner.

The reason organisations may use synthetic data over actual data is because you can get it more quickly, easily and cheaply.

But there are concerns with this approach, because synthetic data is based on models and algorithms designed by humans and their biases.

More data doesn’t necessarily equal better data.

Is synthetic data a brilliant tool for improving data quality, reducing data acquisition costs, managing privacy and reducing overfitting?

Or does synthetic data put us on a slippery slope of hard-to-interrogate models that are technically replacing fact with fiction?

To answer these questions, I recently spoke to Minhaaj Rehman, who is CEO & Chief Data Scientist at Psyda, an AI-enabled academic and industrial research agency.

In this episode of Leaders of Analytics, you will learn:

  • What synthetic data is and how it is generated
  • The most common uses for synthetic data
  • The arguments for and against using synthetic data
  • When synthetic data is most helpful and when it is most risky
  • How to implement best practices for mitigating the risks associated with synthetic data, and much more.

Episode timestamps:

00:00 Intro

03:00 What Psyda Does

04:23 Academic Work and Modern Education

06:38 Getting into Data Science

11:30 What is Synthetic Data

13:30 Common Applications for Synthetic Data

18:50 Pros & Cons of using Synthetic Data

21:29 Risks of using Synthetic Data

23:48 When should Synthetic Data be Used

29:23 Synthetic Data is Cleaner than Real Data

34:05 Using Synthetic Data for Risk Mitigation

36:05 Resources on Learning More about Synthetic Data

38:05 Human Biases in Decision Making

Connect with Minhaaj:

Minhaaj on LinkedIn: https://www.linkedin.com/in/minhaaj/

Minhaaj's website and podcast: https://minhaaj.com/

  continue reading

59 episodes

Artwork
iconShare
 
Manage episode 374320790 series 2951995
Content provided by Jonas Christensen. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Jonas Christensen or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://player.fm/legal.

Ever heard of ‘synthetic data’?

Synthetic data is data that is artificially created (from statistical models), rather than generated by actual events. It contains all the characteristics of production data, minus the sensitive stuff.

By 2024, 60% of the data used for the development of AI and analytics projects will be synthetically generated, according to Gartner.

The reason organisations may use synthetic data over actual data is because you can get it more quickly, easily and cheaply.

But there are concerns with this approach, because synthetic data is based on models and algorithms designed by humans and their biases.

More data doesn’t necessarily equal better data.

Is synthetic data a brilliant tool for improving data quality, reducing data acquisition costs, managing privacy and reducing overfitting?

Or does synthetic data put us on a slippery slope of hard-to-interrogate models that are technically replacing fact with fiction?

To answer these questions, I recently spoke to Minhaaj Rehman, who is CEO & Chief Data Scientist at Psyda, an AI-enabled academic and industrial research agency.

In this episode of Leaders of Analytics, you will learn:

  • What synthetic data is and how it is generated
  • The most common uses for synthetic data
  • The arguments for and against using synthetic data
  • When synthetic data is most helpful and when it is most risky
  • How to implement best practices for mitigating the risks associated with synthetic data, and much more.

Episode timestamps:

00:00 Intro

03:00 What Psyda Does

04:23 Academic Work and Modern Education

06:38 Getting into Data Science

11:30 What is Synthetic Data

13:30 Common Applications for Synthetic Data

18:50 Pros & Cons of using Synthetic Data

21:29 Risks of using Synthetic Data

23:48 When should Synthetic Data be Used

29:23 Synthetic Data is Cleaner than Real Data

34:05 Using Synthetic Data for Risk Mitigation

36:05 Resources on Learning More about Synthetic Data

38:05 Human Biases in Decision Making

Connect with Minhaaj:

Minhaaj on LinkedIn: https://www.linkedin.com/in/minhaaj/

Minhaaj's website and podcast: https://minhaaj.com/

  continue reading

59 episodes

All episodes

×
 
Loading …

Welcome to Player FM!

Player FM is scanning the web for high-quality podcasts for you to enjoy right now. It's the best podcast app and works on Android, iPhone, and the web. Signup to sync subscriptions across devices.

 

Quick Reference Guide