LW - Decomposing Agency - capabilities without desires by owencb

The Nonlinear Library

Content provided by The Nonlinear Fund. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by The Nonlinear Fund or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://player.fm/legal.

27d ago 20:50

MP3•Episode home

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Decomposing Agency - capabilities without desires, published by owencb on July 11, 2024 on LessWrong. What is an agent? It's a slippery concept with no commonly accepted formal definition, but informally the concept seems to be useful. One angle on it is Dennet's Intentional Stance: we think of an entity as being an agent if we can more easily predict it by treating it as having some beliefs and desires which guide its actions. Examples include cats and countries, but the central case is humans. The world is shaped significantly by the choices agents make. What might agents look like in a world with advanced - and even superintelligent - AI? A natural approach for reasoning about this is to draw analogies from our central example. Picture what a really smart human might be like, and then try to figure out how it would be different if it were an AI. But this approach risks baking in subtle assumptions - things that are true of humans, but need not remain true of future agents. One such assumption that is often implicitly made is that "AI agents" is a natural class, and that future AI agents will be unitary - that is, the agents will be practically indivisible entities, like single models. (Humans are unitary in this sense, and while countries are not unitary, their most important components - people - are themselves unitary agents.) This assumption seems unwarranted. While people certainly could build unitary AI agents, and there may be some advantages to doing so, unitary agents are just an important special case among a large space of possibilities for: Components which contain important aspects of agency (without necessarily themselves being agents); Ways to construct agents out of separable subcomponents (none, some, or all of which may be reasonably regarded agents in their own right). We'll begin an exploration of these spaces. We'll consider four features we generally expect agents to have[1]: Goals Things they are trying to achieve e.g. I would like a cup of tea Implementation capacity The ability to act in the world e.g. I have hands and legs Situational awareness Understanding of the world (relevant to the goals) e.g. I know where I am, where the kettle is, and what it takes to make tea Planning capacity The ability to choose actions to effectively further their goals, given their available action set and their understanding of the situation e.g. I'll go downstairs and put the kettle on We don't necessarily expect to be able to point to these things separately - especially in unitary agents they could exist in some intertwined mess. But we kind of think that in some form they have to be present, or the system couldn't be an effective agent. And although these features are not necessarily separable, they are potentially separable - in the sense that there exist possible agents where they are kept cleanly apart. We will explore possible decompositions of agents into pieces which contain different permutations of these features, connected by some kind of scaffolding. We will see several examples where people naturally construct agentic systems in ways where these features are provided by separate components. And we will argue that AI could enable even fuller decomposition. We think it's pretty likely that by default advanced AI will be used to create all kinds of systems across this space. (But people could make deliberate choices to avoid some parts of the space, so "by default" is doing some work here.) A particularly salient division is that there is a coherent sense in which some systems could provide useful plans towards a user's goals, without in any meaningful sense having goals of their own (or conversely, have goals without any meaningful ability to create plans to pursue those goals). In thinking about ensuring the safety of advanced AI systems, it ma...

2429 episodes

#Podcasting Education #The Nonlinear Fund