Victoria Krakovna–AGI Ruin, Sharp Left Turn, Paradigms of AI Alignment

The Inside View

Content provided by Michaël Trazzi. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Michaël Trazzi or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://player.fm/legal.

1+ y ago 1:52:26

M4A•Episode home

Victoria Krakovna is a Research Scientist at DeepMind working on AGI safety and a co-founder of the Future of Life Institute, a non-profit organization working to mitigate technological risks to humanity and increase the chances of a positive future. In this interview we discuss three of her recent LW posts, namely DeepMind Alignment Team Opinions On AGI Ruin Arguments, Refining The Sharp Left Turn Threat Model and Paradigms of AI Alignment.

Transcript: theinsideview.ai/victoria

Youtube: https://youtu.be/ZpwSNiLV-nw

OUTLINE

(00:00) Intro

(00:48) DeepMind Alignment Team Opinions On AGI Ruin Arguments

(05:13) On The Possibility Of Iterating On Dangerous Domains and Pivotal acts

(14:14) Alignment and Interpretability

(18:14) Deciding Not To Build AGI And Stricted Publication norms

(27:18) Specification Gaming And Goal Misgeneralization

(33:02) Alignment Optimism And Probability Of Dying Before 2100 From unaligned AI

(37:52) Refining The Sharp Left Turn Threat Model

(48:15) A 'Move 37' Might Disempower Humanity

(59:59) Finding An Aligned Model Before A Sharp Left Turn

(01:13:33) Detecting Situational Awarareness

(01:19:40) How This Could Fail, Deception After One SGD Step

(01:25:09) Paradigms of AI Alignment

(01:38:04) Language Models Simulating Agency And Goals

(01:45:40) Twitter Questions

(01:48:30) Last Message For The ML Community

54 episodes

#Tech #Michaël Trazzi