## February 28, 2003

### Arman Alchian Plays the Repeated Prisoner's Dilemma

K. Harris asks for a link to an extended passage I posted from William Poundstone's (1992) marvelous book Prisoners Dilemma (New York: Doubleday: 038541580X).

Here it is:

Warning: Professional economists will find this hilarious and thought-provoking. Others... others will probably simply find it bizarre, insane, incomprehensible, and weird. It comes from pp. 106-118 of Poundstone's book.

Posted by DeLong at February 28, 2003 09:33 AM | TrackBack

"What Poundstone means is that, since both players know that the supergame is going to last for 100 periods, there is no reason for people to cooperate in round 100 to induce subsequent cooperation. Hence--whatever else people do--the Nash equilibrium strategy must be to defect in period 100.

"But once you know that the other player will defect in period 100 no matter what you do, the same argument applies to period 99: whatever else people do, the Nash equilibrium strategy must be to defect in period 99.

"Thus the situation "unravels." As long as there is a known, certain last period the only Nash equilibrium is to defect, always, from the first period."

Hold on. Isn't that an unexpected egg argument? Or a surprise inspection? Or a surprise hanging? Or Senior Sneak Week? Or a Pop Quiz? ("Unexpected egg" is my favorite name for this, but no one else seems to use it. Was it one of your math problems?)

The Unexpected Egg is like this: I tell you that somewhere, in boxes 1 through 10, I have secreted one egg. Furthermore, I tell you that, if you open the boxes in sequence, you will be unable to deduce (using what I am telling you) where the egg is before you open that box.

You reason: "If the egg were in box 10, after I opened the first nine boxes, I would know that the egg must be in 10. That contradicts what he said. So the egg must be in 1 through 9. Well, given that the egg is in 1 through 9, if the egg were in 9, then I would know that after I opened boxes 1 through 8. That contradicts what he said. So the egg can't be in 9. So the egg must be in 1 through 8...."

Yet, when you open the boxes in sequence, you don't know that the egg is in box 5 until you open it.

I don't know what the problem with the Unexpected Egg is, but there obviously is one (we can play Unexpected Egg sometime, and you'll never know where the egg is before you open the box). Because I don't know what the problem is, I can't be sure that Poundstone's argument has the same problem.

But it seems eerily similar to me--you know that the last play has no effect on any future plays, so the equilibrium strategy is always to defect; so then you know that the penultimate play has no effect on any future plays, so the equilibrium strategy is to defect....

Posted by: Matt Weiner on February 28, 2003 11:10 AM

Player AA's comments are interesting in light of your colleague Matt Rabin's and others' work on fairness. Also, I wish McCabe, Camerer, et. al. of the Postrel article had been around to wire up JW and AA's heads - an exciting light show, no doubt!

Posted by: Raymond Guiteras on February 28, 2003 12:04 PM

No, alas, it is not an "unexpected hanging" argument. To play "D" all 100 rounds is the only and unique Nash Equilibrium--the only situation in which each player has an incentive to switch to a different strategy...

Posted by: Brad DeLong on February 28, 2003 01:31 PM

The only way for a C/C strategy to be optimal is if neither player knows when the game ends (for instance, if there is a fixed and high enough probability to have one more round after a given round).

Posted by: Kimon on February 28, 2003 01:44 PM

This particular type of match takes the dilemma out of prisoner's dilemma because changes in outcome are zero sum. There is only one winner and one loser no matter what and this kind of goes against the very idea of prisoner's dilemma.... which is probably why one of the players assumed it wasn't zero sum.

Hopefully nobody will read this and think that game theory tells us to be selfish so that we may die with more possessions (or experiences) than our neighbors. What is central in economics is that everyone can get more. But this example does point out there is not too much incentive to be nice to passing strangers. Well.... we do have laws to enforce mutually beneficial outcomes. And there's revenge too. So keep your guard up and try to take advantage of passing strangers but don't get caught.

Posted by: snsterling on February 28, 2003 01:51 PM

I recently played a game in which we were asked to pick a number, from 0 to 100. If we picked the average number, we would be awarded \$20. If we picked the average number * 2/3 we would be awarded \$20.

40 people randomly picking numbers would give us a number somewhere around 50, so this would be optimal. Yet some of those people would pick (50)*(2/3)=33, so 33 would also be optimal. Except that if half of them go for either optimal solution it brings the overal average down, which means that the new optimal solutions are 42 and 28. Except that...

Eventually we get to 0, which is the real optimal solution. If every one picked 0 all participants would receive \$40 as they had each divined both the average and the two-thirds average. Nash equilibrium is 0. (From 50 to 42 to 35 etc. is the "situation unravelling.")

Of course, I didn't pick 0. I picked 39. I reasoned that at each iteration half the participants would realize what was going on and would adjust their guesses accordingly. As the players' logic situation unraveled, the number would settle lower than 50, but much higher than 0.

In the end, the numbers were 41 and 27. I was only slightly too optimistic about the other participants ability to play the game (though the would have lodged the same complaint of me had the number been only 4 lower.)

What I came away with was that the situation unravels in Nash equilibrium games when people are incapable of identifying rational self-interest.

The real clincher would be to assess the participants before the game, so it is not too surprising that Daniel Kahneman
and Vernon L. Smith shared the Nobel Prize last year.

Posted by: Saam Barrager on February 28, 2003 03:44 PM

I am most certainly confused. Nash seems to think that we should calculate the equilibrium in terms of strategies rather than moves. How do you represent this mathematically, given that part of your strategy may be trying to figure out what the other person's strategy is?

Also let me second--this was hilarious. Don't know who Armen Alchian is, though (but I'm googling).

Posted by: Matt Weiner on February 28, 2003 04:23 PM

Kimon wrote:

>The only way for a C/C strategy to be optimal is >if neither player knows when the game ends (for >instance, if there is a fixed and high enough >probability to have one more round after a given >round).

If there is a fixed (but unknown) number of iterations, that's still enough to wreck the case for cooperation isn't it? As soon as I know there will definitely be a last game, the argument for shafting my opponent propagates back from it all the way to the current game.

Less sure about the other proposal (no fixed number of games, some probability of a further game after the current one), but shouldn't each player reason that the probability of the series continuing into eternity is sufficiently low that it's far safer to bet that there wlll be a last game?

The players' comments are indeed very, very funny.

Posted by: Tom Runnacles on February 28, 2003 04:30 PM

If I remember right, it is not enough that there "definitely be a last game" to make defection an equilibriumm.

If there is uncertainty about how many rounds there are (even if it is known that the number is finite) then the participant should weight the outcome of (say) the hundredth round by the probability of the hundredth round actually being played. The "final round" loss may not be significant if the probability is small enough.

The weighting factor is usually called a "discount factor". In other circumstances, it may be a reflection of player's patience or shortsightedness.

Posted by: Tom Slee on February 28, 2003 04:38 PM

>>If there is a fixed (but unknown) number of iterations, that's still enough to wreck the case for cooperation isn't it?<<

I think that it is not enough that the game end with probability one. There has to be a *known* time T by which you know the game will have ended.

But this is not my field...

And I am most struck not by the details of the "unraveling" argument, but by the fact that game theorist John C. Williams, author of _The Compleat Strategyst_, thinks that he can train Arman Alchian to cooperate--at least until you hit the high nineties rounds.

Posted by: Brad DeLong on February 28, 2003 05:41 PM

"I am most certainly confused. Nash seems to think that we should calculate the equilibrium in terms of strategies rather than moves. How do you represent this mathematically, given that part of your strategy may be trying to figure out what the other person's strategy is?"

There's two different games being mixed up here.
If the game is a competition to see who the one winner is then the answer is to end it in a tie by never cooperating. This is no different than the one iteration prisoner's dilemma. It doesn't involve interaction because any player would be crazy not to defect.

If the goal is simply to achieve high scores, but not necessarily compared to anyone else's score, then the strategy in this game is more a matter of investment. You take an investment risk by playing a "C". Knowing that your partner (not an opponent in this case!) has an similar interest in receiving "C" from you, you play some form of initially cooperate follwed by revenge as needed. If the investment doesn't pay off for you then you refuse to risk again such an investment for some period. One good strategy might be a couple of C's to initiate cooperation, followed by a string of D's if not received, and then try to reinitiate cooperation with 2 C's again.

The discount factor in this case only occurs in the last few rounds of the game because the investment horizon is very short. Payback by receiving C's occurs almost immediately. So round 100 you should definitely defect. Maybe even 99 I'm not sure. I'm sure John Nash would advise you not to steal the towels from the hotel until you were ready to check out (and pay with cash or else more rounds might be added).

Posted by: snsterling on February 28, 2003 07:27 PM

Just to clarify...

It is the payoff table which determines the discount rate. The following would have a lower discount rate and thus the non-cooperation would begin much earlier. Think for example, if we knew the world was going to end in 5 years, what is a fair P/E for Microsoft, or would you build a factory based on 20 year depreciation?

CC +5 +5
CD 0 +9
DC +9 0
DD +4 +4

I think this does it, someone should double check it though... Here it takes 5 DDs as compared to CCs to take revenge on someone who profited 4 by stabbing you in the back (CD). (9+5*4 < 6*5). So you've got to play longer strings of D to enforce cooperative behavior. By round 96 you might as well play D though, because 9+4+4+4+4 = 5+5+5+5+5

Posted by: snsterling on February 28, 2003 08:56 PM

Where did you blog this previously?

It is indeed a marvelous book ... scattered through it is a most instructive partial history of previous US encounter with "preventive war" mentality, in the course of which emininent logician and uber-peacenik Bertrand Russell became convinced we had no rational choice but to undertake a massive nuclear strike on the USSR just to reset the clock.

Posted by: RonK, Seattle on February 28, 2003 10:41 PM

OK, I'm going to keep playing the fool in the market.

snsterling--I took it that the claim is that, even if you view the game as non-competitive rather than competitive, the only Nash equilibrium strategy is to defect all the time.*

The argument, isn't it, is that there's no incentive to cooperate when you know that your cooperation won't have any effect on the next person's future play?

Which means there's no incentive to cooperate in the last round, because there's no future play to affect.

Then there's no incentive to cooperate in the next-to-last round, because you know the other player will defect in the last round anyway.

etc.

So as I understand it, Brad is right--if you don't know when the game will end, there never comes a time when you're sure that your current actions won't affect the next round, because you always think there may be a next round.

This all assumes that you've successfully signalled your intent to follow the tit-for-tat strategy, so that the other player knows that his play will affect your next play, if there is one. And perhaps raises the question: Why not signal your intention to follow tit-for-tat, even unto the last round, when it won't have any effects on future cooperation?

The answer: Because when you come to decide what to do in the last round, you'll be better off playing D, no matter what the other player is playing. Admittedly, the other player (let us assume) will play D if he thinks you'll play D. So the question is, should you play C if (1) you will benefit if the other guy thinks you will play C (2) holding his response fixed, you will benefit if you play D (3) what you actually play, as opposed to what he thinks you'll play, can't affect his play?

Great guns, it's Newcomb's paradox again. (I regret that I can't find and link Brad's blog entry on this. It's the one with the alien with perfect predicting powers.)

*Although there's one point where JW seems to think that AA is playing the competitive game.

Posted by: Matt Weiner on March 1, 2003 06:31 AM

This source has some examples of non-constant and zero sum games cooperative and not cooperative:
http://william-king.www.drexel.edu/top/eco/game/nash.html

Nash equilibrium has as part of its definition that the srategies must not change the action unless there is a unilateral reason for doing so. I have read that many believe his mental illness gave him an edge up on viewing things in this unilateral fashion. It is useful to analyze competitive situations in this way because one cannot coordinate multilateral strategies (unless they are in the vitamin business apparently).

However, it is not particularly useful in this two player game of iterated non-constant sum P.D. because there are only two players and they can develop an "understanding" of each other.

The unilateral method of computing this game by starting in round 100 and deducing backwards leads to the same zero economic profit result that occurs in free competition and in many situations without enforcement of a global maximum.

In the recent nightclub fire the Nash equilibrium was for everyone to try to trample the person in front of them to get out more quickly. The result was a pile of horizontally stacked people.

For the 100 round P.D., there is ample reason to risk a C or two just to see if you can hook up with a cooperator. At least as long as the utility of point accumulation is not funky (if loss of one round leads to death for instance, then one would never play C)

If you lose trust in the partner, you can always switch to playing D for the rest of the game. Whether or not you risk C again (or in the first place) depends upon your assessment of their strategy. But the rule of thumb here is, at least check to see if you're up against a cooperator, because the loss of a few rounds is not so horrible.

The reason not to defect in round 90 after developing a relationship of trust is quite straightforward (and I mean that literally... it must be calculated from that point forward and not from round 100 backward). If I defect in 90 I shoot myself in the foot because the other will employ enforcement for several rounds and my crime will not pay. We hold a power over each other and we both know it.

Which is why revenge is such an important principle in law enforcement. If someone commits a crime we might be able to save money by deporting them instead of jailing them. Either way they can't commit the same crime here again. But now we will have developed a relationship with would be criminals who might not mind living abroad.

Posted by: snsterling on March 1, 2003 10:29 AM

Play it online!

Posted by: snsterling on March 1, 2003 01:41 PM

Posted by: K Harris on March 3, 2003 06:27 AM

AA and JW are playing different games.

JW's strategy is to punish AA because AA won't cooperate in choosing CC (AA, JW) all the time. JW sees that a CC strategy pays 50 and 100 (AA, JW), so their total payout is 150, the highest possible. JW also reasons that the only way AA is going to exceed 50 is if JW is a total sucker, so playing CC is AA's best strategy for total gains. The fact that JW receives 100 from this strategy while AA only receives 50 is immaterial, since he does not think AA will ever get more than 50. JW also thinks reasons that every time AA defects, it signals an unwillingness to reach their shared goal of perfect CC.

AA has a different strategy. AA is well aware of the benefits of CC, but also knows that he does better with a few defections to more equitably share the benefits of cooperation. A strategy of 100 CC pays 50 and 100, but a strategy of 80 CC and 20 DC pays 60 and 60, for a total of 120. The cost of 40 to JW allows an equal distribution of the gains from cooperation. This is still in excess of JW's gains of 50 from DD, so it provides a minor incentive to cooperate.

JW's comments reveal that he is totally oblivious to AA's goal of equitable distribution. He concludes at move 10 that he can keep 50 or give 50 to AA while reaping 100. But AA's repeated defections during sustained periods of cooperation, and his comments at 32, indicate that he is willing to cooperate to give JW more than the 50 that DD provides, but he wants a share of the gains transferred to him. Yet defection by AA is met with a derision and punishment, at 16, 26, 38, 49, 67.

JW has approached his strategy as strict tit-for-tat, seeking to ensure cooperation because he does not beleive that cooperation will occur otherwise. But AA has based his strategy (after the initial 10-move effort to establish cooperation) on an assumption that cooperation will occur, and that the only real question is how to distribute its benefits when the only distributive mechanism is to defect occasionally. Despite JW's denunciations, AA actually has MORE faith in cooperation than JW.

Either one would blame the other for a failure to cooperate - but they are seeking cooperation in different tasks.

Posted by: Ethan on March 3, 2003 10:16 AM

"....the only real question is how to distribute its benefits...."

Ethan-

You are right, I did not even notice that AA is trying to steal and expecting JW to have pity on him. But I don't see how it's justified.

AA is altering the payout system because he is willing to sacrifice his own points to establish equality. In some sense this is similar to a zero sum outlook because he is comparing what he has to what the other guy has.

JW is justified to insist on the mutually beneficial collaboration every turn. In fact, JW might also wish to drive a harder bargain with AA so that he might receive a larger share of the cooperative benefit.

Is there a way to show that the fair bargain here is to split the benefit 50-50 because they each have the same to gain from DD to CC ?? What if one had more to lose than the other from a DD vs a CC ??

Posted by: snsterling on March 3, 2003 09:27 PM

Is there a way to show that the fair bargain here is to split the benefit 50-50

Probably not. How would you decide "fairness"?

AA and JW both gain 50 from switching DD to CC, and as a percentage, AA's gain is larger (zero to 50, vs. 50 to 100). The game is structured that in all outcomes, except AA defection, JW reaps a bigger reward. Even JW's defection is more lucrative. The game does not make for "fair" outcomes.

While each round of AA-defect shifts 1 to AA, it costs JW two: the 1 he loses directly plus his opportunity cost from CC. And, given that AA could have made .5 in that round, JW's total cost of shifting one net point of gain to AA is 4! With such a high cost of sharing, what split is fair?

Oddly, it seems that AA realizes this four-fold cost of sharing - his comment at move 6 indicates he understands the relative payouts. But he still demands a return for his trouble.

Posted by: Ethan on March 4, 2003 06:35 AM

"JW is justified to insist on the mutually beneficial collaboration every turn. In fact, JW might also wish to drive a harder bargain with AA so that he might receive a larger share of the cooperative benefit."

Only an economist could write this. Obviously Alchian has concluded that because the initial payoff was arbitrarily asymmetrical, the only correct conduct of the game is to correct the asymmetry. And obviously that *is* the only just solution, as the initial inequality is unmerited. It's Williams who is being unjust in seeking to perpetuate the unjust initial inequality.

Williams as "the smart one, the one trying to induce cooperation, the one understanding that it was the two of them playing the umpire"?? Williams is in fact the one who doesn't understand basic rules of reciprocity. And reading the comments here, sometimes I really fear that economics training creates a race of selfish creatures who are unable to even *understand* the motivations of others, much less cooperate with them.

Posted by: L. on March 10, 2003 06:31 AM