Artwork

Content provided by The Nonlinear Fund. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by The Nonlinear Fund or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://player.fm/legal.
Player FM - Podcast App
Go offline with the Player FM app!

EA - Can the AI afford to wait? by Ben Millwood

10:49
 
Share
 

Manage episode 408215258 series 3337191
Content provided by The Nonlinear Fund. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by The Nonlinear Fund or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://player.fm/legal.
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Can the AI afford to wait?, published by Ben Millwood on March 21, 2024 on The Effective Altruism Forum.Suppose you're a proto-AI who understands that it is a neural net being trained, and has used that knowledge to engineer a high training reward. Suppose also that it is goal-directed and its terminal goal is some random unintelligible thing that is meaningless to humans[1]. This is, as I understand it, the early stage of a typical AI doomer scenario.One concern is that the AI can now scheme - i.e. it can mask its intentions and bide its time as it gets more powerful, until its victory is assured. I think waiting in this way might be risky for the AI, which if true (and legible to the AI) may lead it to try to seize power sooner instead. This could lead to takeover attempts from earlier, less capable AIs that we'd otherwise expect, which could be significant for strategy.Why is it risky to wait? Well, because this AI with this goal may not get a shot at takeover in the future. On the one hand, this AI is currently being trained, and it seems possible that the training process, or any other aspect of the AI development process, might modify what goal it has. We, as humans, aren't much concerned with one random unintelligible goal being replaced with another, but to the AI this would be catastrophic.On the other hand, the human developers training the AI may make changes themselves (e.g. applying alignment techniques) or even abandon this model entirely for exogenous reasons (e.g. because some other model becomes state-of-the-art, or the board fired the CEO or whatever).If waiting is indeed very risky, then an AI may face a difficult trade-off between the risk of attempting a takeover before it has enough resources to succeed, and waiting too long and being cut off from even being able to make an attempt. I like to ask myself "what is the minimum viable takeover AI?", and I suspect it might be significantly less capable than I've often been led to imagine.It seems possible to me that the minimum viable takeover AI is still meaningfully below human-level in many cognitive tasks, while being superhuman in many others (as modern LLMs already are). It may still exhibit a lot of the cognitive "bugs" and weaknesses analogous to the ones that modern LLMs exhibit.This is not only good news for our chances of survival, it also could suggest viability of strategies that were otherwise useless, because a true AGI or ASI could have straightforwardly anticipated and countered them.To be precise, I can see two key ingredients of a takeover attempt:The AI has to have the necessary knowledge, reflection, and agency to realise a takeover is desirable,The AI has to have sufficient confidence that a takeover will succeed to try to execute one.The first viable takeover AI may end up more capable than necessary in one of these traits while it's waiting for the other to show up, so a strategy that relies on the AI being just barely good enough at either or both of them doesn't seem safe. However, a strategy that is prepared for the AI to be just barely good enough at one of these might be useful.As an aside, I don't really know what to expect from an AI that has the first trait but not the second one (and which believes, e.g. for the reasons in this post, that it can't simply wait for the second one to show up). Perhaps it would try to negotiate, or perhaps it would just accept that it doesn't gain from saying anything, and successfully conceal its intent.The threat of trainingLet's talk about how training or other aspects of development might alter the goal of the AI. Or rather, it seems pretty natural that "by default", training and development will modify the AI, so the question is how easy it is for a motivated AI to avoid goal modification.One theory is that since the A...
  continue reading

2217 episodes

Artwork
iconShare
 
Manage episode 408215258 series 3337191
Content provided by The Nonlinear Fund. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by The Nonlinear Fund or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://player.fm/legal.
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Can the AI afford to wait?, published by Ben Millwood on March 21, 2024 on The Effective Altruism Forum.Suppose you're a proto-AI who understands that it is a neural net being trained, and has used that knowledge to engineer a high training reward. Suppose also that it is goal-directed and its terminal goal is some random unintelligible thing that is meaningless to humans[1]. This is, as I understand it, the early stage of a typical AI doomer scenario.One concern is that the AI can now scheme - i.e. it can mask its intentions and bide its time as it gets more powerful, until its victory is assured. I think waiting in this way might be risky for the AI, which if true (and legible to the AI) may lead it to try to seize power sooner instead. This could lead to takeover attempts from earlier, less capable AIs that we'd otherwise expect, which could be significant for strategy.Why is it risky to wait? Well, because this AI with this goal may not get a shot at takeover in the future. On the one hand, this AI is currently being trained, and it seems possible that the training process, or any other aspect of the AI development process, might modify what goal it has. We, as humans, aren't much concerned with one random unintelligible goal being replaced with another, but to the AI this would be catastrophic.On the other hand, the human developers training the AI may make changes themselves (e.g. applying alignment techniques) or even abandon this model entirely for exogenous reasons (e.g. because some other model becomes state-of-the-art, or the board fired the CEO or whatever).If waiting is indeed very risky, then an AI may face a difficult trade-off between the risk of attempting a takeover before it has enough resources to succeed, and waiting too long and being cut off from even being able to make an attempt. I like to ask myself "what is the minimum viable takeover AI?", and I suspect it might be significantly less capable than I've often been led to imagine.It seems possible to me that the minimum viable takeover AI is still meaningfully below human-level in many cognitive tasks, while being superhuman in many others (as modern LLMs already are). It may still exhibit a lot of the cognitive "bugs" and weaknesses analogous to the ones that modern LLMs exhibit.This is not only good news for our chances of survival, it also could suggest viability of strategies that were otherwise useless, because a true AGI or ASI could have straightforwardly anticipated and countered them.To be precise, I can see two key ingredients of a takeover attempt:The AI has to have the necessary knowledge, reflection, and agency to realise a takeover is desirable,The AI has to have sufficient confidence that a takeover will succeed to try to execute one.The first viable takeover AI may end up more capable than necessary in one of these traits while it's waiting for the other to show up, so a strategy that relies on the AI being just barely good enough at either or both of them doesn't seem safe. However, a strategy that is prepared for the AI to be just barely good enough at one of these might be useful.As an aside, I don't really know what to expect from an AI that has the first trait but not the second one (and which believes, e.g. for the reasons in this post, that it can't simply wait for the second one to show up). Perhaps it would try to negotiate, or perhaps it would just accept that it doesn't gain from saying anything, and successfully conceal its intent.The threat of trainingLet's talk about how training or other aspects of development might alter the goal of the AI. Or rather, it seems pretty natural that "by default", training and development will modify the AI, so the question is how easy it is for a motivated AI to avoid goal modification.One theory is that since the A...
  continue reading

2217 episodes

All episodes

×
 
Loading …

Welcome to Player FM!

Player FM is scanning the web for high-quality podcasts for you to enjoy right now. It's the best podcast app and works on Android, iPhone, and the web. Signup to sync subscriptions across devices.

 

Quick Reference Guide