Artwork

Content provided by Debra J. Farber (Shifting Privacy Left). All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Debra J. Farber (Shifting Privacy Left) or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://player.fm/legal.
Player FM - Podcast App
Go offline with the Player FM app!

S2E10: Leveraging Synthetic Data and Privacy Guarantees with Lipika Ramaswamy (Gretel.ai)

45:38
 
Share
 

Manage episode 421035587 series 3407760
Content provided by Debra J. Farber (Shifting Privacy Left). All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Debra J. Farber (Shifting Privacy Left) or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://player.fm/legal.

This week, we welcome Lipika Ramaswamy, Senior Applied Scientist at Gretel AI, a privacy tech company that makes it simple to generate anonymized and safe synthetic data via APIs. Previously, Lipika worked as a Data Scientist at LeapYear Technologies, and was the Machine Learning Researcher at Harvard University's Privacy Tools Project.

Lipika’s interest in both machine learning and privacy comes from her love of math and things that can be defined with equations. Her interest was piqued in grad school and accidentally walked into a classroom holding a lecture on Applying Differential Privacy for Data Science. The intersection of data combined with the privacy guarantees that we have available today has kept her hooked ever since.
---------
Thank you to our sponsor, Privado, the developer-friendly privacy platform
---------
There's a lot to unpack when it comes to synthetic data & privacy guarantees, as she takes listeners on a deep dive of these compelling topics. Lipika finds elegant how privacy assurances like differential privacy revolve around math and statistics at their core. Essentially, she loves building things with 'usable privacy' & security that people can easily use. We also delve into the metrics tracked in the Gretel Synthetic Data Report, which assesses both 'statistical integrity' & 'privacy levels' of a customer's training data.
Topics Covered:

  • The definition of 'synthetic data,' & good use cases
  • The process of creating synthetic data
  • How to ensure that synthetic data is 'privacy-preserving'
  • Privacy problems that may arise from overtraining ML models
  • When to use synthetic data rather than other techniques like tokenization, anonymization, aggregation & others
  • Examples of good use cases vs poor use cases for using synthetic data
  • Common misperceptions around synthetic data
  • Gretel.ai's approach to 'privacy assurance,' including a focus on 'privacy filters,' which prevent some privacy harms outputted by LLMs
  • How to plug into the 'synthetic data' community
  • Who bears the responsibility for educating the public about new technology like LLMs and potential harms
  • Highlights from Gretel.ai's Synthesize 2023 conference

Resources Mentioned:

Guest Info:

Send us a text

Privado.ai
Privacy assurance at the speed of product development. Get instant visibility w/ privacy code scans.
Shifting Privacy Left Media
Where privacy engineers gather, share, & learn
Buzzsprout - Launch your podcast
Disclaimer: This post contains affiliate links. If you make a purchase, I may receive a commission at no extra cost to you.
Copyright © 2022 - 2024 Principled LLC. All rights reserved.

  continue reading

Chapters

1. S2E10: Leveraging Synthetic Data and Privacy Guarantees with Lipika Ramaswamy (Gretel.ai) (00:00:00)

2. Debra introduces Lipika Ramaswamy, Sr. Applied Scientist at Gretel.ai (00:01:15)

3. Lipika discusses her origin story: her interest in ML & privacy, and how she ended up in the field (00:01:56)

4. Lipika defines 'synthetic data' & good use cases for synthetic data (00:03:18)

5. Lipika discusses the process of creating synthetic data (00:06:12)

6. How to ensure that synthetic data is 'privacy-preserving' (00:08:18)

7. Privacy problems that may arise from overtraining ML models (00:10:12)

8. When to use synthetic data rather than other techniques like tokenization, anonymization, aggregation & others (00:11:04)

9. Selecting the right PET to use for a specific use case depends on: 1) the data model; 2) the adversarial modeling you're working with; and, 3) your analytics goals (00:17:05)

10. Good & poor use cases for 'synthetic data' (00:19:37)

11. Common misperceptions around synthetic data (00:21:11)

12. Gretel.ai's approach to 'privacy assurance,' including a focus on 'privacy filters,' which prevent some privacy harms outputted by LLMs (00:24:01)

13. Lipika recommends several communities to plug into to keep up-to-date with the 'synthetic data' community (00:29:58)

14. We discuss the problem of bias in ML/AI (00:32:29)

15. Debra & Lipika discuss who bears the responsibility for educating the public about new technology like LLMs and potential harms (00:33:17)

16. Debra & Lipika discuss the privacy problems involved with training on public-available data that is still considered personal data (00:40:29)

17. Lipika highlights Gretel.ai's recent 'synthetic data' conference, Synthesize 2023 (00:43:15)

63 episodes

Artwork
iconShare
 
Manage episode 421035587 series 3407760
Content provided by Debra J. Farber (Shifting Privacy Left). All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Debra J. Farber (Shifting Privacy Left) or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://player.fm/legal.

This week, we welcome Lipika Ramaswamy, Senior Applied Scientist at Gretel AI, a privacy tech company that makes it simple to generate anonymized and safe synthetic data via APIs. Previously, Lipika worked as a Data Scientist at LeapYear Technologies, and was the Machine Learning Researcher at Harvard University's Privacy Tools Project.

Lipika’s interest in both machine learning and privacy comes from her love of math and things that can be defined with equations. Her interest was piqued in grad school and accidentally walked into a classroom holding a lecture on Applying Differential Privacy for Data Science. The intersection of data combined with the privacy guarantees that we have available today has kept her hooked ever since.
---------
Thank you to our sponsor, Privado, the developer-friendly privacy platform
---------
There's a lot to unpack when it comes to synthetic data & privacy guarantees, as she takes listeners on a deep dive of these compelling topics. Lipika finds elegant how privacy assurances like differential privacy revolve around math and statistics at their core. Essentially, she loves building things with 'usable privacy' & security that people can easily use. We also delve into the metrics tracked in the Gretel Synthetic Data Report, which assesses both 'statistical integrity' & 'privacy levels' of a customer's training data.
Topics Covered:

  • The definition of 'synthetic data,' & good use cases
  • The process of creating synthetic data
  • How to ensure that synthetic data is 'privacy-preserving'
  • Privacy problems that may arise from overtraining ML models
  • When to use synthetic data rather than other techniques like tokenization, anonymization, aggregation & others
  • Examples of good use cases vs poor use cases for using synthetic data
  • Common misperceptions around synthetic data
  • Gretel.ai's approach to 'privacy assurance,' including a focus on 'privacy filters,' which prevent some privacy harms outputted by LLMs
  • How to plug into the 'synthetic data' community
  • Who bears the responsibility for educating the public about new technology like LLMs and potential harms
  • Highlights from Gretel.ai's Synthesize 2023 conference

Resources Mentioned:

Guest Info:

Send us a text

Privado.ai
Privacy assurance at the speed of product development. Get instant visibility w/ privacy code scans.
Shifting Privacy Left Media
Where privacy engineers gather, share, & learn
Buzzsprout - Launch your podcast
Disclaimer: This post contains affiliate links. If you make a purchase, I may receive a commission at no extra cost to you.
Copyright © 2022 - 2024 Principled LLC. All rights reserved.

  continue reading

Chapters

1. S2E10: Leveraging Synthetic Data and Privacy Guarantees with Lipika Ramaswamy (Gretel.ai) (00:00:00)

2. Debra introduces Lipika Ramaswamy, Sr. Applied Scientist at Gretel.ai (00:01:15)

3. Lipika discusses her origin story: her interest in ML & privacy, and how she ended up in the field (00:01:56)

4. Lipika defines 'synthetic data' & good use cases for synthetic data (00:03:18)

5. Lipika discusses the process of creating synthetic data (00:06:12)

6. How to ensure that synthetic data is 'privacy-preserving' (00:08:18)

7. Privacy problems that may arise from overtraining ML models (00:10:12)

8. When to use synthetic data rather than other techniques like tokenization, anonymization, aggregation & others (00:11:04)

9. Selecting the right PET to use for a specific use case depends on: 1) the data model; 2) the adversarial modeling you're working with; and, 3) your analytics goals (00:17:05)

10. Good & poor use cases for 'synthetic data' (00:19:37)

11. Common misperceptions around synthetic data (00:21:11)

12. Gretel.ai's approach to 'privacy assurance,' including a focus on 'privacy filters,' which prevent some privacy harms outputted by LLMs (00:24:01)

13. Lipika recommends several communities to plug into to keep up-to-date with the 'synthetic data' community (00:29:58)

14. We discuss the problem of bias in ML/AI (00:32:29)

15. Debra & Lipika discuss who bears the responsibility for educating the public about new technology like LLMs and potential harms (00:33:17)

16. Debra & Lipika discuss the privacy problems involved with training on public-available data that is still considered personal data (00:40:29)

17. Lipika highlights Gretel.ai's recent 'synthetic data' conference, Synthesize 2023 (00:43:15)

63 episodes

All episodes

×
 
Loading …

Welcome to Player FM!

Player FM is scanning the web for high-quality podcasts for you to enjoy right now. It's the best podcast app and works on Android, iPhone, and the web. Signup to sync subscriptions across devices.

 

Quick Reference Guide