Artwork

Content provided by Michaël Trazzi. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Michaël Trazzi or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://player.fm/legal.
Player FM - Podcast App
Go offline with the Player FM app!

Victoria Krakovna–AGI Ruin, Sharp Left Turn, Paradigms of AI Alignment

1:52:26
 
Share
 

Manage episode 352333412 series 2966339
Content provided by Michaël Trazzi. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Michaël Trazzi or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://player.fm/legal.

Victoria Krakovna is a Research Scientist at DeepMind working on AGI safety and a co-founder of the Future of Life Institute, a non-profit organization working to mitigate technological risks to humanity and increase the chances of a positive future. In this interview we discuss three of her recent LW posts, namely DeepMind Alignment Team Opinions On AGI Ruin Arguments, Refining The Sharp Left Turn Threat Model and Paradigms of AI Alignment.

Transcript: theinsideview.ai/victoria

Youtube: https://youtu.be/ZpwSNiLV-nw

OUTLINE

(00:00) Intro

(00:48) DeepMind Alignment Team Opinions On AGI Ruin Arguments

(05:13) On The Possibility Of Iterating On Dangerous Domains and Pivotal acts

(14:14) Alignment and Interpretability

(18:14) Deciding Not To Build AGI And Stricted Publication norms

(27:18) Specification Gaming And Goal Misgeneralization

(33:02) Alignment Optimism And Probability Of Dying Before 2100 From unaligned AI

(37:52) Refining The Sharp Left Turn Threat Model

(48:15) A 'Move 37' Might Disempower Humanity

(59:59) Finding An Aligned Model Before A Sharp Left Turn

(01:13:33) Detecting Situational Awarareness

(01:19:40) How This Could Fail, Deception After One SGD Step

(01:25:09) Paradigms of AI Alignment

(01:38:04) Language Models Simulating Agency And Goals

(01:45:40) Twitter Questions

(01:48:30) Last Message For The ML Community

  continue reading

54 episodes

Artwork
iconShare
 
Manage episode 352333412 series 2966339
Content provided by Michaël Trazzi. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Michaël Trazzi or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://player.fm/legal.

Victoria Krakovna is a Research Scientist at DeepMind working on AGI safety and a co-founder of the Future of Life Institute, a non-profit organization working to mitigate technological risks to humanity and increase the chances of a positive future. In this interview we discuss three of her recent LW posts, namely DeepMind Alignment Team Opinions On AGI Ruin Arguments, Refining The Sharp Left Turn Threat Model and Paradigms of AI Alignment.

Transcript: theinsideview.ai/victoria

Youtube: https://youtu.be/ZpwSNiLV-nw

OUTLINE

(00:00) Intro

(00:48) DeepMind Alignment Team Opinions On AGI Ruin Arguments

(05:13) On The Possibility Of Iterating On Dangerous Domains and Pivotal acts

(14:14) Alignment and Interpretability

(18:14) Deciding Not To Build AGI And Stricted Publication norms

(27:18) Specification Gaming And Goal Misgeneralization

(33:02) Alignment Optimism And Probability Of Dying Before 2100 From unaligned AI

(37:52) Refining The Sharp Left Turn Threat Model

(48:15) A 'Move 37' Might Disempower Humanity

(59:59) Finding An Aligned Model Before A Sharp Left Turn

(01:13:33) Detecting Situational Awarareness

(01:19:40) How This Could Fail, Deception After One SGD Step

(01:25:09) Paradigms of AI Alignment

(01:38:04) Language Models Simulating Agency And Goals

(01:45:40) Twitter Questions

(01:48:30) Last Message For The ML Community

  continue reading

54 episodes

All episodes

×
 
Loading …

Welcome to Player FM!

Player FM is scanning the web for high-quality podcasts for you to enjoy right now. It's the best podcast app and works on Android, iPhone, and the web. Signup to sync subscriptions across devices.

 

Quick Reference Guide