Two sources of beyond-episode goals (Section 2.2.2 of "Scheming AIs")

Joe Carlsmith Audio

Content provided by Joe Carlsmith. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Joe Carlsmith or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://player.fm/legal.

11M ago 21:25

MP3•Episode home

This is section 2.2.2 of my report “Scheming AIs: Will AIs fake alignment during training in order to get power?”

Text of the report here: https://arxiv.org/abs/2311.08379
Summary of the report here: https://joecarlsmith.com/2023/11/15/new-report-scheming-ais-will-ais-fake-alignment-during-training-in-order-to-get-power
Audio summary here: https://joecarlsmithaudio.buzzsprout.com/2034731/13969977-introduction-and-summary-of-scheming-ais-will-ais-fake-alignment-during-training-in-order-to-get-power

Chapters

1. Two sources of beyond-episode goals (Section 2.2.2 of "Scheming AIs") (00:00:00)

2. 2.2.2 Two sources of beyond-episode goals (00:00:28)

3. 2.2.2.1 Training-game-independent beyond-episode goals (00:01:32)

4. 2.2.2.1.1 Are beyond-episode goals the default? (00:03:26)

5. 2.2.2.1.2 How will models think about time? (00:05:02)

6. 2.2.2.1.3 The role of “reflection” (00:08:09)

7. 2.2.2.1.4 Pushing back on beyond-episode goals using adversarial training (00:10:56)

8. 2.2.2.2 Training-game-dependent beyond-episode goals (00:12:45)

9. 2.2.2.2.1 Can gradient descent “notice” the benefits of turning a non-schemer into a schemer? (00:14:47)

10. 2.2.2.2.2 Is SGD pulling scheming out of models by any means necessary? (00:18:51)

56 episodes

#Society #Philosophy #Joe

Two sources of beyond-episode goals (Section 2.2.2 of "Scheming AIs")

Joe Carlsmith Audio

published 11M ago

MP3•Episode home

This is section 2.2.2 of my report “Scheming AIs: Will AIs fake alignment during training in order to get power?”

Chapters

1. Two sources of beyond-episode goals (Section 2.2.2 of "Scheming AIs") (00:00:00)

2. 2.2.2 Two sources of beyond-episode goals (00:00:28)

3. 2.2.2.1 Training-game-independent beyond-episode goals (00:01:32)

4. 2.2.2.1.1 Are beyond-episode goals the default? (00:03:26)

5. 2.2.2.1.2 How will models think about time? (00:05:02)

6. 2.2.2.1.3 The role of “reflection” (00:08:09)

7. 2.2.2.1.4 Pushing back on beyond-episode goals using adversarial training (00:10:56)

8. 2.2.2.2 Training-game-dependent beyond-episode goals (00:12:45)

9. 2.2.2.2.1 Can gradient descent “notice” the benefits of turning a non-schemer into a schemer? (00:14:47)

10. 2.2.2.2.2 Is SGD pulling scheming out of models by any means necessary? (00:18:51)

56 episodes

#Society #Philosophy #Joe

All episodes

Welcome to Player FM!

Player FM is scanning the web for high-quality podcasts for you to enjoy right now. It's the best podcast app and works on Android, iPhone, and the web. Signup to sync subscriptions across devices.

Listen to 500+ topics

Similar to Joe Carlsmith Audio

Podcasts Worth a Listen

Joe Carlsmith Audio « » Two sources of beyond-episode goals (Section 2.2.2 of "Scheming AIs")

Chapters

1. Two sources of beyond-episode goals (Section 2.2.2 of "Scheming AIs") (00:00:00)

2. 2.2.2 Two sources of beyond-episode goals (00:00:28)

3. 2.2.2.1 Training-game-independent beyond-episode goals (00:01:32)

4. 2.2.2.1.1 Are beyond-episode goals the default? (00:03:26)

5. 2.2.2.1.2 How will models think about time? (00:05:02)

6. 2.2.2.1.3 The role of “reflection” (00:08:09)

7. 2.2.2.1.4 Pushing back on beyond-episode goals using adversarial training (00:10:56)

8. 2.2.2.2 Training-game-dependent beyond-episode goals (00:12:45)

9. 2.2.2.2.1 Can gradient descent “notice” the benefits of turning a non-schemer into a schemer? (00:14:47)

10. 2.2.2.2.2 Is SGD pulling scheming out of models by any means necessary? (00:18:51)

Two sources of beyond-episode goals (Section 2.2.2 of "Scheming AIs")

Chapters

1. Two sources of beyond-episode goals (Section 2.2.2 of "Scheming AIs") (00:00:00)

2. 2.2.2 Two sources of beyond-episode goals (00:00:28)

3. 2.2.2.1 Training-game-independent beyond-episode goals (00:01:32)

4. 2.2.2.1.1 Are beyond-episode goals the default? (00:03:26)

5. 2.2.2.1.2 How will models think about time? (00:05:02)

6. 2.2.2.1.3 The role of “reflection” (00:08:09)

7. 2.2.2.1.4 Pushing back on beyond-episode goals using adversarial training (00:10:56)

8. 2.2.2.2 Training-game-dependent beyond-episode goals (00:12:45)

9. 2.2.2.2.1 Can gradient descent “notice” the benefits of turning a non-schemer into a schemer? (00:14:47)

10. 2.2.2.2.2 Is SGD pulling scheming out of models by any means necessary? (00:18:51)

Podcasts Worth a Listen

Welcome to Player FM!

Similar to Joe Carlsmith Audio

Quick Reference Guide

Joe Carlsmith Audio « »
Two sources of beyond-episode goals (Section 2.2.2 of "Scheming AIs")