AF - A more systematic case for inner misalignment by Richard Ngo

The Nonlinear Library

Content provided by The Nonlinear Fund. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by The Nonlinear Fund or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://player.fm/legal.

1M ago 9:14

MP3•Episode home

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: A more systematic case for inner misalignment, published by Richard Ngo on July 20, 2024 on The AI Alignment Forum. This post builds on my previous post making the case that squiggle-maximizers are plausible. The argument I presented was a deliberately simplified one, though, and glossed over several possible issues. In this post I'll raise and explore three broad objections. (Before looking at mine, I encourage you to think of your own biggest objections to the argument, and jot them down in the comments.) Intelligence requires easily-usable representations "Intelligence as compression" is an interesting frame, but it ignores the tradeoff between simplicity and speed. Compressing knowledge too heavily makes it difficult to use. For example, it's very hard to identify most macroscopic implications of the Standard Model of physics, even though in theory all of chemistry could be deduced from it. That's why both humans and LLMs store a huge number of facts and memories in ways that our minds can access immediately, using up more space in exchange for rapid recall. Even superintelligences which are much better than humans at deriving low-level facts from high-level facts would still save time by storing the low-level facts as well. So we need to draw a distinction between having compressed representations, and having only compressed representations. The latter is what would compress a mind overall; the former could actually increase the space requirements, since the new compressed representations would need to be stored alongside non-compressed representations. This consideration makes premise 1 from my previous post much less plausible. In order to salvage it, we need some characterization of the relationship between compressed and non-compressed representations. I'll loosely define systematicity to mean the extent to which an agent's representations are stored in a hierarchical structure where representations at the bottom could be rederived from simple representations at the top. Intuitively speaking, this measures the simplicity of representations weighted by how "fundamental" they are to the agent's ontology. Let me characterize systematicity with an example. Suppose you're a park ranger, and you know a huge number of facts about the animals that live in your park. One day you learn evolutionary theory for the first time, which helps explain a lot of the different observations you'd made. In theory, this could allow you to compress your knowledge: you could forget some facts about animals, and still be able to rederive them later by reasoning backwards from evolutionary theory if you wanted to. But in practice, it's very helpful for you to have those facts readily available. So learning about evolution doesn't actually reduce the amount of knowledge you need to store. What it does do, though, is help structure that knowledge. Now you have a range of new categories (like "costly signaling" or "kin altruism") into which you can fit examples of animal behavior. You'll be able to identify when existing concepts are approximations to more principled concepts, and figure out when you should be using each one. You'll also be able to generalize far better to predict novel phenomena - e.g. the properties of new animals that move into your park. So let's replace premise 1 in my previous post with the claim that increasing intelligence puts pressure on representations to become more systematic. I don't think we're in a position where we can justify this in any rigorous way. But are there at least good intuitions for why this is plausible? One suggestive analogy: intelligent minds are like high-functioning organizations, and many of the properties you want in minds correspond to properties of such organizations: 1. You want disagreements between different people to be resolved...

2432 episodes

#Podcasting Education #The Nonlinear Fund