Artwork

Content provided by The Nonlinear Fund. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by The Nonlinear Fund or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://player.fm/legal.
Player FM - Podcast App
Go offline with the Player FM app!

LW - The need for multi-agent experiments by Martín Soto

17:17
 
Share
 

Manage episode 431935640 series 3314709
Content provided by The Nonlinear Fund. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by The Nonlinear Fund or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://player.fm/legal.
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: The need for multi-agent experiments, published by Martín Soto on August 1, 2024 on LessWrong.
TL;DR: Let's start iterating on experiments that approximate real, society-scale multi-AI deployment
Epistemic status: These ideas seem like my most prominent
delta with the average AI Safety researcher, have stood the test of time, and are shared by others I intellectually respect. Please attack them fiercely!
Multi-polar risks
Some authors have already written about
multi-polar AI failure. I especially like how Andrew Critch has tried to sketch
concrete stories for it.
But, without even considering concrete stories yet, I think there's a good a priori argument in favor of worrying about multi-polar failures:
We care about the future of society. Certain AI agents will be introduced, and we think they could reduce our control over the trajectory of this system. The way in which this could happen can be divided into two steps:
1. The agents (with certain properties) are introduced in certain positions
2. Given the agents' properties and positions, they interact with each other and the rest of the system, possibly leading to big changes
So in order to better control the outcome, it seems worth it to try to understand and manage both steps, instead of limiting ourselves to (1), which is what the alignment community has traditionally done.
Of course, this is just one, very abstract argument, which we should update based on observations and more detailed technical understanding. But it makes me think the burden of proof is on multi-agent skeptics to explain why (2) is not important.
Many have taken on that burden. The most common reason to dismiss the importance of (2) is expecting a centralized intelligence explosion, a fast and unipolar software takeoff, like Yudkowsky's FOOM. Proponents usually argue that the intelligences we are likely to train will, after meeting a sharp threshold of capabilities, quickly bootstrap themselves to capabilities drastically above those of any other existing agent or ensemble of agents.
And that these capabilities will allow them to gain near-complete strategic advantage and control over the future. In this scenario, all the action is happening inside a single agent, and so you should only care about shaping its properties (or delaying its existence).
I tentatively expect more of a decentralized hardware singularity[1] than centralized software FOOM. But there's a weaker claim in which I'm more confident: we shouldn't right now be near-certain of a centralized FOOM.[2] I expect this to be the main crux with many multi-agent skeptics, and won't argue for it here (but rather in an upcoming post).
Even given a decentralized singularity, one can argue that the most leveraged way for us to improve multi-agent interactions is by ensuring that individual agents possess certain properties (like honesty or transparency), or that at least we have enough technical expertise to shape them on the go. I completely agree that this is the natural first thing to look at.
But I think focusing on multi-agent interactions directly is a strong second, and a lot of marginal value might lie there given how neglected they've been until now (more
below). I do think many multi-agent interventions will require certain amounts of single-agent alignment technology. This will of course be a crux with alignment pessimists.
Finally, for this work to be counterfactually useful it's also required that AI itself (in decision-maker or researcher positions) won't iteratively solve the problem by default. Here, I do think we have
some reasons to expect (65%) that intelligent enough AIs aligned with their principals don't automatically solve catastrophic conflict. In those worlds, early interventions can make a big difference setting the right incentives for future agents, or providi...
  continue reading

2437 episodes

Artwork
iconShare
 
Manage episode 431935640 series 3314709
Content provided by The Nonlinear Fund. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by The Nonlinear Fund or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://player.fm/legal.
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: The need for multi-agent experiments, published by Martín Soto on August 1, 2024 on LessWrong.
TL;DR: Let's start iterating on experiments that approximate real, society-scale multi-AI deployment
Epistemic status: These ideas seem like my most prominent
delta with the average AI Safety researcher, have stood the test of time, and are shared by others I intellectually respect. Please attack them fiercely!
Multi-polar risks
Some authors have already written about
multi-polar AI failure. I especially like how Andrew Critch has tried to sketch
concrete stories for it.
But, without even considering concrete stories yet, I think there's a good a priori argument in favor of worrying about multi-polar failures:
We care about the future of society. Certain AI agents will be introduced, and we think they could reduce our control over the trajectory of this system. The way in which this could happen can be divided into two steps:
1. The agents (with certain properties) are introduced in certain positions
2. Given the agents' properties and positions, they interact with each other and the rest of the system, possibly leading to big changes
So in order to better control the outcome, it seems worth it to try to understand and manage both steps, instead of limiting ourselves to (1), which is what the alignment community has traditionally done.
Of course, this is just one, very abstract argument, which we should update based on observations and more detailed technical understanding. But it makes me think the burden of proof is on multi-agent skeptics to explain why (2) is not important.
Many have taken on that burden. The most common reason to dismiss the importance of (2) is expecting a centralized intelligence explosion, a fast and unipolar software takeoff, like Yudkowsky's FOOM. Proponents usually argue that the intelligences we are likely to train will, after meeting a sharp threshold of capabilities, quickly bootstrap themselves to capabilities drastically above those of any other existing agent or ensemble of agents.
And that these capabilities will allow them to gain near-complete strategic advantage and control over the future. In this scenario, all the action is happening inside a single agent, and so you should only care about shaping its properties (or delaying its existence).
I tentatively expect more of a decentralized hardware singularity[1] than centralized software FOOM. But there's a weaker claim in which I'm more confident: we shouldn't right now be near-certain of a centralized FOOM.[2] I expect this to be the main crux with many multi-agent skeptics, and won't argue for it here (but rather in an upcoming post).
Even given a decentralized singularity, one can argue that the most leveraged way for us to improve multi-agent interactions is by ensuring that individual agents possess certain properties (like honesty or transparency), or that at least we have enough technical expertise to shape them on the go. I completely agree that this is the natural first thing to look at.
But I think focusing on multi-agent interactions directly is a strong second, and a lot of marginal value might lie there given how neglected they've been until now (more
below). I do think many multi-agent interventions will require certain amounts of single-agent alignment technology. This will of course be a crux with alignment pessimists.
Finally, for this work to be counterfactually useful it's also required that AI itself (in decision-maker or researcher positions) won't iteratively solve the problem by default. Here, I do think we have
some reasons to expect (65%) that intelligent enough AIs aligned with their principals don't automatically solve catastrophic conflict. In those worlds, early interventions can make a big difference setting the right incentives for future agents, or providi...
  continue reading

2437 episodes

All episodes

×
 
Loading …

Welcome to Player FM!

Player FM is scanning the web for high-quality podcasts for you to enjoy right now. It's the best podcast app and works on Android, iPhone, and the web. Signup to sync subscriptions across devices.

 

Quick Reference Guide