Artwork

Content provided by The Nonlinear Fund. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by The Nonlinear Fund or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://player.fm/legal.
Player FM - Podcast App
Go offline with the Player FM app!

AF - [Aspiration-based designs] 2. Formal framework, basic algorithm by Jobst Heitzig

29:21
 
Share
 

Manage episode 415566975 series 2997284
Content provided by The Nonlinear Fund. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by The Nonlinear Fund or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://player.fm/legal.
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: [Aspiration-based designs] 2. Formal framework, basic algorithm, published by Jobst Heitzig on April 28, 2024 on The AI Alignment Forum. Summary. In this post, we present the formal framework we adopt during the sequence, and the simplest form of the type of aspiration-based algorithms we study. We do this for a simple form of aspiration-type goals: making the expectation of some variable equal to some given target value. The algorithm is based on the idea of propagating aspirations along time, and we prove that the algorithm gives a performance guarantee if the goal is feasible. Later posts discuss safety criteria, other types of goals, and variants of the basic algorithm. Assumptions In line with the working hypotheses stated in the previous post, we assume more specifically the following in this post: The agent is a general-purpose AI system that is given a potentially long sequence of tasks, one by one, which it does not know in advance. Most aspects of what we discuss focus on the current task only, but some aspects relate to the fact that there will be further, unknown tasks later (e.g., the question of how much power the agent shall aim to retain at the end of the task). It possesses an overall world model that represents a good enough general understanding of how the world works. Whenever the agent is given a task, an episode begins and its overall world model provides it with a (potentially much simpler) task-specific world model that represents everything that is relevant for the time period until the agent gets a different task or is deactivated, and that can be used to predict the potentially stochastic consequences of taking certain actions in certain world states. That task-specific world model has the form of a (fully observed) Markov Decision Process (MDP) that however does not contain a reward function R but instead contains what we call an evaluation function related to the task (see 2nd to next bullet point). As a consequence of a state transition, i.e., of taking a certain action a in a certain state s and finding itself in a certain successor state s', a certain task-relevant evaluation metric changes by some amount. Importantly, we do not assume that the evaluation metric inherently encodes things of which more is better. E.g., the evaluation metric could be global mean temperature, client's body mass, x coordinate of the agent's right thumb, etc. We call the step-wise change in the evaluation metric the received Delta in that time step, denoted δ. We call its cumulative sum over all time steps of the episode the Total, denoted τ. Formally, Delta and Total play a similar role for our aspiration-based approach as the concepts of "reward" and "return" play for maximization-based approaches. The crucial difference is that our agent is not tasked to maximize Total (since the evaluation metric does not have the interpretation of "more is better") but to aim for some specific value of the Total. The evaluation function contained in the MDP specifies the expected value of δ for all possible transitions: Eδ(s,a,s').[1] First challenge: guaranteeing the fulfillment of expectation-type goals The challenge in this post is to design a decision algorithm for tasks where the agent's goal is to make the expected (!) Total equal (!) a certain value ER which we call the aspiration value. [2] This is a crucial difference from a "satisficing" approach that would aim to make expected Total at least as large as E and would thus still be happy to maximize Total. Later we consider other types of tasks, both less restrictive ones (including those related to satisficing) and more specific ones that also care about other aspects of the resulting distribution of Total or states. It turns out that we can guarantee the fulfillment of this type of goal under some weak condit...
  continue reading

2423 episodes

Artwork
iconShare
 
Manage episode 415566975 series 2997284
Content provided by The Nonlinear Fund. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by The Nonlinear Fund or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://player.fm/legal.
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: [Aspiration-based designs] 2. Formal framework, basic algorithm, published by Jobst Heitzig on April 28, 2024 on The AI Alignment Forum. Summary. In this post, we present the formal framework we adopt during the sequence, and the simplest form of the type of aspiration-based algorithms we study. We do this for a simple form of aspiration-type goals: making the expectation of some variable equal to some given target value. The algorithm is based on the idea of propagating aspirations along time, and we prove that the algorithm gives a performance guarantee if the goal is feasible. Later posts discuss safety criteria, other types of goals, and variants of the basic algorithm. Assumptions In line with the working hypotheses stated in the previous post, we assume more specifically the following in this post: The agent is a general-purpose AI system that is given a potentially long sequence of tasks, one by one, which it does not know in advance. Most aspects of what we discuss focus on the current task only, but some aspects relate to the fact that there will be further, unknown tasks later (e.g., the question of how much power the agent shall aim to retain at the end of the task). It possesses an overall world model that represents a good enough general understanding of how the world works. Whenever the agent is given a task, an episode begins and its overall world model provides it with a (potentially much simpler) task-specific world model that represents everything that is relevant for the time period until the agent gets a different task or is deactivated, and that can be used to predict the potentially stochastic consequences of taking certain actions in certain world states. That task-specific world model has the form of a (fully observed) Markov Decision Process (MDP) that however does not contain a reward function R but instead contains what we call an evaluation function related to the task (see 2nd to next bullet point). As a consequence of a state transition, i.e., of taking a certain action a in a certain state s and finding itself in a certain successor state s', a certain task-relevant evaluation metric changes by some amount. Importantly, we do not assume that the evaluation metric inherently encodes things of which more is better. E.g., the evaluation metric could be global mean temperature, client's body mass, x coordinate of the agent's right thumb, etc. We call the step-wise change in the evaluation metric the received Delta in that time step, denoted δ. We call its cumulative sum over all time steps of the episode the Total, denoted τ. Formally, Delta and Total play a similar role for our aspiration-based approach as the concepts of "reward" and "return" play for maximization-based approaches. The crucial difference is that our agent is not tasked to maximize Total (since the evaluation metric does not have the interpretation of "more is better") but to aim for some specific value of the Total. The evaluation function contained in the MDP specifies the expected value of δ for all possible transitions: Eδ(s,a,s').[1] First challenge: guaranteeing the fulfillment of expectation-type goals The challenge in this post is to design a decision algorithm for tasks where the agent's goal is to make the expected (!) Total equal (!) a certain value ER which we call the aspiration value. [2] This is a crucial difference from a "satisficing" approach that would aim to make expected Total at least as large as E and would thus still be happy to maximize Total. Later we consider other types of tasks, both less restrictive ones (including those related to satisficing) and more specific ones that also care about other aspects of the resulting distribution of Total or states. It turns out that we can guarantee the fulfillment of this type of goal under some weak condit...
  continue reading

2423 episodes

All episodes

×
 
Loading …

Welcome to Player FM!

Player FM is scanning the web for high-quality podcasts for you to enjoy right now. It's the best podcast app and works on Android, iPhone, and the web. Signup to sync subscriptions across devices.

 

Quick Reference Guide