LW - How to Give in to Threats (without incentivizing them) by Mikhail Samin

The Nonlinear Library

Content provided by The Nonlinear Fund. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by The Nonlinear Fund or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://player.fm/legal.

3d ago 9:17

MP3•Episode home

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: How to Give in to Threats (without incentivizing them), published by Mikhail Samin on September 13, 2024 on LessWrong.
TL;DR: using a simple mixed strategy, LDT can give in to threats, ultimatums, and commitments - while incentivizing cooperation and fair[1] splits instead.
This strategy made it much more intuitive to many people I've talked to that smart agents probably won't do weird everyone's-utility-eating things like threatening each other or participating in commitment races.
1. The Ultimatum game
This part is taken from planecrash[2][3].
You're in the Ultimatum game. You're offered 0-10 dollars. You can accept or reject the offer. If you accept, you get what's offered, and the offerer gets $(10-offer). If you reject, both you and the offerer get nothing.
The simplest strategy that incentivizes fair splits is to accept everything 5 and reject everything < 5. The offerer can't do better than by offering you 5. If you accepted offers of 1, the offerer that knows this would always offer you 1 and get 9, instead of being incentivized to give you 5. Being unexploitable in the sense of incentivizing fair splits is a very important property that your strategy might have.
With the simplest strategy, if you're offered 5..10, you get 5..10; if you're offered 0..4, you get 0 in expectation.
Can you do better than that? What is a strategy that you could use that would get more than 0 in expectation if you're offered 1..4, while still being unexploitable (i.e., still incentivizing splits of at least 5)?
I encourage you to stop here and try to come up with a strategy before continuing.
The solution, explained by Yudkowsky in planecrash (children split 12 jellychips, so the offers are 0..12):
When the children return the next day, the older children tell them the correct solution to the original Ultimatum Game.
It goes like this:
When somebody offers you a 7:5 split, instead of the 6:6 split that would be fair, you should accept their offer with slightly less than 6/7 probability. Their expected value from offering you 7:5, in this case, is 7 * slightly less than 6/7, or slightly less than 6. This ensures they can't do any better by offering you an unfair split; but neither do you try to destroy all their expected value in retaliation.
It could be an honest mistake, especially if the real situation is any more complicated than the original Ultimatum Game.
If they offer you 8:4, accept with probability slightly-more-less than 6/8, so they do even worse in their own expectation by offering you 8:4 than 7:5.
It's not about retaliating harder, the harder they hit you with an unfair price - that point gets hammered in pretty hard to the kids, a Watcher steps in to repeat it. This setup isn't about retaliation, it's about what both sides have to do, to turn the problem of dividing the gains, into a matter of fairness; to create the incentive setup whereby both sides don't expect to do any better by distorting their own estimate of what is 'fair'.
[The next stage involves a complicated dynamic-puzzle with two stations, that requires two players working simultaneously to solve. After it's been solved, one player locks in a number on a 0-12 dial, the other player may press a button, and the puzzle station spits out jellychips thus divided.
The gotcha is, the 2-player puzzle-game isn't always of equal difficulty for both players. Sometimes, one of them needs to work a lot harder than the other.]
They play the 2-station video games again. There's less anger and shouting this time. Sometimes, somebody rolls a continuous-die and then rejects somebody's offer, but whoever gets rejected knows that they're not being punished. Everybody is just following the Algorithm.
Your notion of fairness didn't match their notion of fairness, and they did what the Algorithm says to do in that case, but ...

2448 episodes

#Podcasting Education #The Nonlinear Fund