Artwork

Content provided by The Nonlinear Fund. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by The Nonlinear Fund or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://player.fm/legal.
Player FM - Podcast App
Go offline with the Player FM app!

AF - aimless ace analyzes active amateur: a micro-aaaaalignment proposal by Luke H Miles

1:45
 
Share
 

Manage episode 430000560 series 2997284
Content provided by The Nonlinear Fund. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by The Nonlinear Fund or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://player.fm/legal.
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: aimless ace analyzes active amateur: a micro-aaaaalignment proposal, published by Luke H Miles on July 21, 2024 on The AI Alignment Forum. This idea is so simple that I'm sure it's been had by someone somewhere. Suppose we have some method to make really smart honest AIs that do not have goals. Let's say it's a yes/no oracle. Our aimless ace. But we want to accomplish stuff! AIcorp wants the printmoneynow.py. I'm worried that if I train the ace to fold fitted sheets and unkink the garden hose then it will go overboard and fold my femur bone and unkink the Yellowstone super volcano. So let's use a bit-less-smart model to do tasks. The active amateur. We'll ask the ace if the amateur is plotting something. If yes, we shut down the amateur. Every day we also ask "is today the last day I can shut down the amateur?" (We do not generate code with the ace!) So we get super-smart supervision of a decent task AI. The temptation (ie competitive pressure) to agentify the oracle is there still, but no longer overwhelming. Maybe we can even have the amateur write ace_v2, have ace_v1 check ace_v2, switch to ace_v2, have amateur_v1 write amateur_v2, check it with ace_v2, then switch to amateur_v2. If this looks like it's avoiding the hard parts of the alignment problem, that's because it is! But does something actually go wrong? Was it too big an assumption that we could build an aimless honest ace? (Or if you think the idea is good, then a bit of positive encouragement would be wonderful.) Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.
  continue reading

2444 episodes

Artwork
iconShare
 
Manage episode 430000560 series 2997284
Content provided by The Nonlinear Fund. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by The Nonlinear Fund or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://player.fm/legal.
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: aimless ace analyzes active amateur: a micro-aaaaalignment proposal, published by Luke H Miles on July 21, 2024 on The AI Alignment Forum. This idea is so simple that I'm sure it's been had by someone somewhere. Suppose we have some method to make really smart honest AIs that do not have goals. Let's say it's a yes/no oracle. Our aimless ace. But we want to accomplish stuff! AIcorp wants the printmoneynow.py. I'm worried that if I train the ace to fold fitted sheets and unkink the garden hose then it will go overboard and fold my femur bone and unkink the Yellowstone super volcano. So let's use a bit-less-smart model to do tasks. The active amateur. We'll ask the ace if the amateur is plotting something. If yes, we shut down the amateur. Every day we also ask "is today the last day I can shut down the amateur?" (We do not generate code with the ace!) So we get super-smart supervision of a decent task AI. The temptation (ie competitive pressure) to agentify the oracle is there still, but no longer overwhelming. Maybe we can even have the amateur write ace_v2, have ace_v1 check ace_v2, switch to ace_v2, have amateur_v1 write amateur_v2, check it with ace_v2, then switch to amateur_v2. If this looks like it's avoiding the hard parts of the alignment problem, that's because it is! But does something actually go wrong? Was it too big an assumption that we could build an aimless honest ace? (Or if you think the idea is good, then a bit of positive encouragement would be wonderful.) Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.
  continue reading

2444 episodes

All episodes

×
 
Loading …

Welcome to Player FM!

Player FM is scanning the web for high-quality podcasts for you to enjoy right now. It's the best podcast app and works on Android, iPhone, and the web. Signup to sync subscriptions across devices.

 

Quick Reference Guide