Artwork

Content provided by Michaël Trazzi. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Michaël Trazzi or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://player.fm/legal.
Player FM - Podcast App
Go offline with the Player FM app!

Jesse Hoogland on Developmental Interpretability and Singular Learning Theory

43:11
 
Share
 

Manage episode 368091676 series 2966339
Content provided by Michaël Trazzi. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Michaël Trazzi or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://player.fm/legal.

Jesse Hoogland is a research assistant at David Krueger's lab in Cambridge studying AI Safety. More recently, Jesse has been thinking about Singular Learning Theory and Developmental Interpretability, which we discuss in this episode. Before he came to grips with existential risk from AI, he co-founded a health-tech startup automating bariatric surgery patient journeys.

(00:00) Intro

(03:57) Jesse’s Story And Probability Of Doom

(06:21) How Jesse Got Into Singular Learning Theory

(08:50) Intuition behind SLT: the loss landscape

(12:23) Does SLT actually predict anything? Phase Transitions

(14:37) Why care about phase transition, grokking, etc

(15:56) Detecting dangerous capabilities like deception in the (devel)opment

(17:24) A concrete example: magnets

(20:06) Why Jesse Is Bullish On Interpretability

(23:57) Developmental Interpretability

(28:06) What Happens Next? Jesse’s Vision

(31:56) Toy Models of Superposition

(32:47) Singular Learning Theory Part 2

(36:22) Are Current Models Creative? Reasoning?

(38:19) Building Bridges Between Alignment And Other Disciplines

(41:08) Where To Learn More About Singular Learning Theory

Make sure I upload regularly: https://patreon.com/theinsideview

Youtube: https://youtu.be/713KyknwShA

Transcript: https://theinsideview.ai/jesse

Jesse: https://twitter.com/jesse_hoogland

Host: https://twitter.com/MichaelTrazzi

Patreon supporters:

- Vincent Weisser

- Gunnar Höglund

- Ryan Coppolo

- Edward Huff

- Emil Wallner

- Jesse Hoogland

- William Freire

- Cameron Holmes

- Jacques Thibodeau

- Max Chiswick

- Jack Seroy

- JJ Hepburn

  continue reading

54 episodes

Artwork
iconShare
 
Manage episode 368091676 series 2966339
Content provided by Michaël Trazzi. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Michaël Trazzi or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://player.fm/legal.

Jesse Hoogland is a research assistant at David Krueger's lab in Cambridge studying AI Safety. More recently, Jesse has been thinking about Singular Learning Theory and Developmental Interpretability, which we discuss in this episode. Before he came to grips with existential risk from AI, he co-founded a health-tech startup automating bariatric surgery patient journeys.

(00:00) Intro

(03:57) Jesse’s Story And Probability Of Doom

(06:21) How Jesse Got Into Singular Learning Theory

(08:50) Intuition behind SLT: the loss landscape

(12:23) Does SLT actually predict anything? Phase Transitions

(14:37) Why care about phase transition, grokking, etc

(15:56) Detecting dangerous capabilities like deception in the (devel)opment

(17:24) A concrete example: magnets

(20:06) Why Jesse Is Bullish On Interpretability

(23:57) Developmental Interpretability

(28:06) What Happens Next? Jesse’s Vision

(31:56) Toy Models of Superposition

(32:47) Singular Learning Theory Part 2

(36:22) Are Current Models Creative? Reasoning?

(38:19) Building Bridges Between Alignment And Other Disciplines

(41:08) Where To Learn More About Singular Learning Theory

Make sure I upload regularly: https://patreon.com/theinsideview

Youtube: https://youtu.be/713KyknwShA

Transcript: https://theinsideview.ai/jesse

Jesse: https://twitter.com/jesse_hoogland

Host: https://twitter.com/MichaelTrazzi

Patreon supporters:

- Vincent Weisser

- Gunnar Höglund

- Ryan Coppolo

- Edward Huff

- Emil Wallner

- Jesse Hoogland

- William Freire

- Cameron Holmes

- Jacques Thibodeau

- Max Chiswick

- Jack Seroy

- JJ Hepburn

  continue reading

54 episodes

All episodes

×
 
Loading …

Welcome to Player FM!

Player FM is scanning the web for high-quality podcasts for you to enjoy right now. It's the best podcast app and works on Android, iPhone, and the web. Signup to sync subscriptions across devices.

 

Quick Reference Guide