Artwork

Content provided by The Nonlinear Fund. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by The Nonlinear Fund or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://player.fm/legal.
Player FM - Podcast App
Go offline with the Player FM app!

LW - Optimistic Assumptions, Longterm Planning, and "Cope" by Raemon

11:02
 
Share
 

Manage episode 429411636 series 2997284
Content provided by The Nonlinear Fund. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by The Nonlinear Fund or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://player.fm/legal.
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Optimistic Assumptions, Longterm Planning, and "Cope", published by Raemon on July 17, 2024 on LessWrong. Eliezer periodically complains about people coming up with questionable plans with questionable assumptions to deal with AI, and then either: Saying "well, if this assumption doesn't hold, we're doomed, so we might as well assume it's true." Worse: coming up with cope-y reasons to assume that the assumption isn't even questionable at all. It's just a pretty reasonable worldview. Sometimes the questionable plan is "an alignment scheme, which Eliezer thinks avoids the hard part of the problem." Sometimes it's a sketchy reckless plan that's probably going to blow up and make things worse. Some people complain about Eliezer being a doomy Negative Nancy who's overly pessimistic. I had an interesting experience a few months ago when I ran some beta-tests of my Planmaking and Surprise Anticipation workshop, that I think are illustrative. i. Slipping into a more Convenient World I have an exercise where I give people the instruction to play a puzzle game ("Baba is You"), but where you normally have the ability to move around and interact with the world to experiment and learn things, instead, you need to make a complete plan for solving the level, and you aim to get it right on your first try. In the exercise, I have people write down the steps of their plan, and assign a probability to each step. If there is a part of the puzzle-map that you aren't familiar with, you'll have to make guesses. I recommend making 2-3 guesses for how a new mechanic might work. (I don't recommend making a massive branching tree for every possible eventuality. For the sake of the exercise not taking forever, I suggest making 2-3 branching path plans) Several months ago, I had three young-ish alignment researchers do this task (each session was a 1-1 with just me and them). Each of them looked at the level for awhile and said "Well, this looks basically impossible... unless this [questionable assumption I came up with that I don't really believe in] is true. I think that assumption is... 70% likely to be true." Then they went an executed their plan. It failed. The questionable assumption was not true. Then, each of them said, again "okay, well here's a different sketchy assumption that I wouldn't have thought was likely except if it's not true, the level seems unsolveable." I asked "what's your probability for that one being true?" "70%" "Okay. You ready to go ahead again?" I asked. "Yep", they said. They tried again. The plan failed again. And, then they did it a third time, still saying ~70%. This happened with three different junior alignment researchers, making a total of 9 predictions, which were wrong 100% of the time. (The third guy, on the the second or third time, said "well... okay, I was wrong last time. So this time let's say it's... 60%.") My girlfriend ran a similar exercise with another group of young smart people, with similar results. "I'm 90% sure this is going to work" ... "okay that didn't work." Later I ran the exercise again, this time with a mix of younger and more experienced AI safety folk, several of whom leaned more pessimistic. I think the group overall did better. One of them actually made the correct plan on the first try. One them got it wrong, but gave an appropriately low estimate for themselves. Another of them (call them Bob) made three attempts, and gave themselves ~50% odds on each attempt. They went into the experience thinking "I expect this to be hard but doable, and I believe in developing the skill of thinking ahead like this." But, after each attempt, Bob was surprised by how out-of-left field their errors were. They'd predicted they'd be surprised... but they were surprised in surprising ways - even in a simplified, toy domain that was optimized for ...
  continue reading

2445 episodes

Artwork
iconShare
 
Manage episode 429411636 series 2997284
Content provided by The Nonlinear Fund. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by The Nonlinear Fund or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://player.fm/legal.
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Optimistic Assumptions, Longterm Planning, and "Cope", published by Raemon on July 17, 2024 on LessWrong. Eliezer periodically complains about people coming up with questionable plans with questionable assumptions to deal with AI, and then either: Saying "well, if this assumption doesn't hold, we're doomed, so we might as well assume it's true." Worse: coming up with cope-y reasons to assume that the assumption isn't even questionable at all. It's just a pretty reasonable worldview. Sometimes the questionable plan is "an alignment scheme, which Eliezer thinks avoids the hard part of the problem." Sometimes it's a sketchy reckless plan that's probably going to blow up and make things worse. Some people complain about Eliezer being a doomy Negative Nancy who's overly pessimistic. I had an interesting experience a few months ago when I ran some beta-tests of my Planmaking and Surprise Anticipation workshop, that I think are illustrative. i. Slipping into a more Convenient World I have an exercise where I give people the instruction to play a puzzle game ("Baba is You"), but where you normally have the ability to move around and interact with the world to experiment and learn things, instead, you need to make a complete plan for solving the level, and you aim to get it right on your first try. In the exercise, I have people write down the steps of their plan, and assign a probability to each step. If there is a part of the puzzle-map that you aren't familiar with, you'll have to make guesses. I recommend making 2-3 guesses for how a new mechanic might work. (I don't recommend making a massive branching tree for every possible eventuality. For the sake of the exercise not taking forever, I suggest making 2-3 branching path plans) Several months ago, I had three young-ish alignment researchers do this task (each session was a 1-1 with just me and them). Each of them looked at the level for awhile and said "Well, this looks basically impossible... unless this [questionable assumption I came up with that I don't really believe in] is true. I think that assumption is... 70% likely to be true." Then they went an executed their plan. It failed. The questionable assumption was not true. Then, each of them said, again "okay, well here's a different sketchy assumption that I wouldn't have thought was likely except if it's not true, the level seems unsolveable." I asked "what's your probability for that one being true?" "70%" "Okay. You ready to go ahead again?" I asked. "Yep", they said. They tried again. The plan failed again. And, then they did it a third time, still saying ~70%. This happened with three different junior alignment researchers, making a total of 9 predictions, which were wrong 100% of the time. (The third guy, on the the second or third time, said "well... okay, I was wrong last time. So this time let's say it's... 60%.") My girlfriend ran a similar exercise with another group of young smart people, with similar results. "I'm 90% sure this is going to work" ... "okay that didn't work." Later I ran the exercise again, this time with a mix of younger and more experienced AI safety folk, several of whom leaned more pessimistic. I think the group overall did better. One of them actually made the correct plan on the first try. One them got it wrong, but gave an appropriately low estimate for themselves. Another of them (call them Bob) made three attempts, and gave themselves ~50% odds on each attempt. They went into the experience thinking "I expect this to be hard but doable, and I believe in developing the skill of thinking ahead like this." But, after each attempt, Bob was surprised by how out-of-left field their errors were. They'd predicted they'd be surprised... but they were surprised in surprising ways - even in a simplified, toy domain that was optimized for ...
  continue reading

2445 episodes

すべてのエピソード

×
 
Loading …

Welcome to Player FM!

Player FM is scanning the web for high-quality podcasts for you to enjoy right now. It's the best podcast app and works on Android, iPhone, and the web. Signup to sync subscriptions across devices.

 

Quick Reference Guide