Artwork

Content provided by The Nonlinear Fund. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by The Nonlinear Fund or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://player.fm/legal.
Player FM - Podcast App
Go offline with the Player FM app!

AF - A simple case for extreme inner misalignment by Richard Ngo

12:20
 
Share
 

Manage episode 428708580 series 3314709
Content provided by The Nonlinear Fund. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by The Nonlinear Fund or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://player.fm/legal.
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: A simple case for extreme inner misalignment, published by Richard Ngo on July 13, 2024 on The AI Alignment Forum. This post is the version of Yudkowsky's argument for inner misalignment that I wish I'd had in my head a few years ago. I don't claim that it's novel, that I endorse it, or even that Yudkowsky would endorse it; it's primarily an attempt to map his ideas into an ontology that makes sense to me (and hopefully others). This post is formulated in terms of three premises, which I explore in turn. My arguments deliberately gloss over some nuances and possible objections; in a follow-up post, I'll explore three of them. In a third post I'll dig into the objection I find most compelling, and outline a research agenda that aims to flesh it out into a paradigm for thinking about cognition more generally, which I'm calling coalitional agency. Background An early thought experiment illustrating the possibility of misaligned AI is the "paperclip maximizer", an AI with the sole goal of creating as many paperclips as possible. This thought experiment has often been used to describe outer misalignment - e.g. a case where the AI was given the goal of making paperclips. However, Yudkowsky claims that his original version was intended to refer to an inner alignment failure in which an AI developed the goal of producing " tiny molecules shaped like paperclips" (with that specific shape being an arbitrary example unrelated to human paperclips). So instead of referring to paperclip maximizers, I'll follow Yudkowsky's more recent renaming and talk about " squiggle maximizers": AIs that attempt to fill the universe with some very low-level pattern that's meaningless to humans (e.g. "molecular squiggles" of a certain shape). I'll argue for the plausibility of squiggle-maximizers via three claims: 1. Increasing intelligence requires compressing representations; and 2. The simplest goals are highly decomposable broadly-scoped utility functions; therefore 3. Increasingly intelligent AIs will converge towards squiggle-maximization. In this post I'll explore each of these in turn. I'll primarily aim to make the positive case in this post; if you have an objection that I don't mention here, I may discuss it in the next post. Increasing intelligence requires compressing representations There's no consensus definition of intelligence, but one definition that captures the key idea in my mind is: the ability to discover and take advantage of patterns in the world. When you look at a grid of pixels and recognize a cat, or look at a string of characters and recognize a poem, you're doing a type of pattern-recognition. Higher-level patterns include scientific laws, statistical trendlines, theory of mind, etc. Discovering such patterns allows an agent to represent real-world information in a simpler way: instead of storing every pixel or every character, they can store higher-level patterns along with whichever low-level details don't fit the pattern. This is (at a high level) also how compression algorithms work. The thesis that intelligence is about compression has most prominently been advocated by Marcus Hutter, who formulated AIXI and created a prize for text compression. The enormous success of the self-supervised learning paradigm a few decades later is a vindication of his ideas (see also this talk by llya Sutskever exploring the link between them). However, we shouldn't interpret this thesis merely as a claim about self-supervised learning. We can be agnostic about whether compression primarily occurs via self-supervised learning, or fine-tuning, or regularization, or meta-learning, or directed exploration, or chain-of-thought, or new techniques that we don't have yet. Instead we should take it as a higher-level constraint on agents: if agents are intelligent, then they must consiste...
  continue reading

2429 episodes

Artwork
iconShare
 
Manage episode 428708580 series 3314709
Content provided by The Nonlinear Fund. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by The Nonlinear Fund or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://player.fm/legal.
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: A simple case for extreme inner misalignment, published by Richard Ngo on July 13, 2024 on The AI Alignment Forum. This post is the version of Yudkowsky's argument for inner misalignment that I wish I'd had in my head a few years ago. I don't claim that it's novel, that I endorse it, or even that Yudkowsky would endorse it; it's primarily an attempt to map his ideas into an ontology that makes sense to me (and hopefully others). This post is formulated in terms of three premises, which I explore in turn. My arguments deliberately gloss over some nuances and possible objections; in a follow-up post, I'll explore three of them. In a third post I'll dig into the objection I find most compelling, and outline a research agenda that aims to flesh it out into a paradigm for thinking about cognition more generally, which I'm calling coalitional agency. Background An early thought experiment illustrating the possibility of misaligned AI is the "paperclip maximizer", an AI with the sole goal of creating as many paperclips as possible. This thought experiment has often been used to describe outer misalignment - e.g. a case where the AI was given the goal of making paperclips. However, Yudkowsky claims that his original version was intended to refer to an inner alignment failure in which an AI developed the goal of producing " tiny molecules shaped like paperclips" (with that specific shape being an arbitrary example unrelated to human paperclips). So instead of referring to paperclip maximizers, I'll follow Yudkowsky's more recent renaming and talk about " squiggle maximizers": AIs that attempt to fill the universe with some very low-level pattern that's meaningless to humans (e.g. "molecular squiggles" of a certain shape). I'll argue for the plausibility of squiggle-maximizers via three claims: 1. Increasing intelligence requires compressing representations; and 2. The simplest goals are highly decomposable broadly-scoped utility functions; therefore 3. Increasingly intelligent AIs will converge towards squiggle-maximization. In this post I'll explore each of these in turn. I'll primarily aim to make the positive case in this post; if you have an objection that I don't mention here, I may discuss it in the next post. Increasing intelligence requires compressing representations There's no consensus definition of intelligence, but one definition that captures the key idea in my mind is: the ability to discover and take advantage of patterns in the world. When you look at a grid of pixels and recognize a cat, or look at a string of characters and recognize a poem, you're doing a type of pattern-recognition. Higher-level patterns include scientific laws, statistical trendlines, theory of mind, etc. Discovering such patterns allows an agent to represent real-world information in a simpler way: instead of storing every pixel or every character, they can store higher-level patterns along with whichever low-level details don't fit the pattern. This is (at a high level) also how compression algorithms work. The thesis that intelligence is about compression has most prominently been advocated by Marcus Hutter, who formulated AIXI and created a prize for text compression. The enormous success of the self-supervised learning paradigm a few decades later is a vindication of his ideas (see also this talk by llya Sutskever exploring the link between them). However, we shouldn't interpret this thesis merely as a claim about self-supervised learning. We can be agnostic about whether compression primarily occurs via self-supervised learning, or fine-tuning, or regularization, or meta-learning, or directed exploration, or chain-of-thought, or new techniques that we don't have yet. Instead we should take it as a higher-level constraint on agents: if agents are intelligent, then they must consiste...
  continue reading

2429 episodes

All episodes

×
 
Loading …

Welcome to Player FM!

Player FM is scanning the web for high-quality podcasts for you to enjoy right now. It's the best podcast app and works on Android, iPhone, and the web. Signup to sync subscriptions across devices.

 

Quick Reference Guide