Intro to Brain-Like-AGI Safety

AI Safety Fundamentals: Alignment

Content provided by BlueDot Impact. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by BlueDot Impact or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://player.fm/legal.

2M ago 1:02:10

MP3•Episode home

(Sections 3.1-3.4, 6.1-6.2, and 7.1-7.5)

Suppose we someday build an Artificial General Intelligence algorithm using similar principles of learning and cognition as the human brain. How would we use such an algorithm safely?

I will argue that this is an open technical problem, and my goal in this post series is to bring readers with no prior knowledge all the way up to the front-line of unsolved problems as I see them.

If this whole thing seems weird or stupid, you should start right in on Post #1, which contains definitions, background, and motivation. Then Posts #2–#7 are mainly neuroscience, and Posts #8–#15 are more directly about AGI safety, ending with a list of open questions and advice for getting involved in the field.

Source:

https://www.lesswrong.com/s/HzcM2dkCq7fwXBej8

Narrated for AI Safety Fundamentals by Perrin Walker of TYPE III AUDIO.

---

A podcast by BlueDot Impact.
Learn more on the AI Safety Fundamentals website.

Chapters

1. Intro to Brain-Like-AGI Safety (00:00:00)

2. 3. Two subsystems: Learning & Steering (00:00:14)

3. 3.1 Post summary / Table of contents (00:00:19)

4. 3.2 Big picture (00:04:07)

5. 3.2.1 Each subsystem generally needs its own sensory processor (00:09:35)

6. 3.3 “Triune Brain Theory” is wrong, but let’s not throw out the baby with the bathwater (00:12:27)

7. 3.4 Three types of ingredients in a Steering Subsystem (00:16:35)

8. 3.4.1 Summary table (00:16:44)

9. 3.4.2 Aside: what do I mean by “drives”? (00:18:37)

10. 3.4.3 Category A: Things the Steering Subsystem needs to do in order to get general intelligence (e.g. curiosity drive) (00:20:46)

11. 3.4.4 Category B: Everything else in the human Steering Subsystem (e.g. altruism-related drives) (00:24:15)

12. 3.4.5 Category C: Every other possibility (e.g. drive to increase my bank account balance) (00:28:26)

13. 6. Big picture of motivation, decision-making, and RL (00:31:19)

14. 6.1 Post summary / Table of contents (00:31:30)

15. 6.2 Big picture (00:35:54)

16. 6.2.1 Relation to “two subsystems” (00:37:43)

17. 6.2.2 Quick run-through (00:38:41)

18. 7. From hardcoded drives to foresighted plans: A worked example (00:42:30)

19. 7.1 Post summary / Table of contents (00:42:43)

20. 7.2 Reminder from the previous post: big picture of motivation and decision-making (00:45:24)

21. 7.3 Building a probabilistic generative world-model in the cortex (00:46:21)

22. 7.4 Credit assignment when I first bite into the cake (00:48:40)

23. 7.5 Planning towards goals via reward-shaping (00:53:53)

24. 7.5.1 The other Thought Assessors. Or: The heroic feat of ordering a cake for next week, when you’re feeling nauseous right now (00:59:09)

83 episodes

#Tech #Society #Philosophy #Blue Dot Impact