This week I had the opportunity to try something that was in my To-Do list a while ago. The idea came almost instantly after reading Dr. Richard Dawkins book, The Selfish Gene (which was a BD gift, thanks Andy).
I feel the obligated necessity to program my own implementation of the prisoners dilemma and make my own version of the contest. To try different parameters of greediness and vindictiveness, just coding in the name of science.
Here’s a basic definition of prisoners dilemma that can be found in this Wikipedia article:
The structure of the traditional Prisoners’ Dilemma can be generalized from its original prisoner setting. Suppose that the two players are represented by the colors, red and blue, and that each player chooses to either “Cooperate” or “Defect”.
If both players cooperate, they both receive the reward, R, for cooperating. If Blue defects while Red cooperates, then Blue receives the temptation, T payoff while Red receives the “sucker’s”, S, payoff.
Similarly, if Blue cooperates while Red defects, then Blue receives the sucker’s payoff S while Red receives the temptation payoff T. If both players defect, they both receive the punishment payoff P.
And to be a prisoner’s dilemma game in the strong sense, the following condition must hold for the payoffs:
T > R > P > S
I decided to program three basic strategy functions:
tit.for.tat.bot: this simple strategy repeats opponent’s last choice
This bot/strategy is parameter-free
greedy.bot: this strategy is affected by the parameter of greedy.level, which increases the probability of the bot to “defect” when this parameter is close to 1.0. If this parameter is set to 0.5, then the bot will choose randomly.
This bot does not care about the previous action of the other player, it only decides to “defect” or “cooperate” based on its own greediness.
vindictive.bot: this strategy is affected by the parameter of vindictive.level, which is only used when the previous action of the other player is “defect” in the same sense as the greedy.level affects the greedy.bot, otherwise it will “cooperate”.
This means that if the other player plays “cooperate”, this bot will “cooperate” too, but at any moment when the other player plays “defect”, this bot will trigger its own vindictive.level to decide if it will play “defect” or to “cooperate”.
All bots have three parameters as arguments:
action: last action of the other player
*The greedy.level parameter in the tit.for.tat.bot and the vindictive.bot is only used when this bots are playing as player.1 and have to decide the first action/move against player.2. After that event, this parameter is no longer used for these bots.
Each point in the plots means 1000 (iter=1000) plays of player.1 vs player.2
Parameters for the payoff matrix: T=20, R=5, P=1 and S=0
Try different pairwise comparisons of strategies varying the greediness and the vindictiveness of the bots
Make our clandestine fight club with the bots (1st RULE: You do not code about FIGHT CLUB)
1.1) GREEDY BOT VS TIT FOR TAT BOT
Using different thresholds of greediness in a greedy.bot vs tit.for.tat.bot. we can see that the tit.for.tat.bot handles very well when this bot increases its greediness. this bot cooperates when the adversary cooperates, but reacts when the other player decides to “defects”
1.2) VINDICTIVE BOT VS TIT FOR TAT BOT
Using different thresholds of vindictive.level in a vindictive.bot vs tit.for.tat.bot. A) We can see that the vindictive.bot does not “defect” unless provoked. Therefore as these bots “cooperate” along all the simulations, they get the score for 5000 which is the maximum score for mutual cooperation along 1000 iterations. B) For this experiment, we are increasing the greedy.level and the vindictive.level by steps of 0.05. The greedy.level in the vindictive.bot only determines the probability to “defect” in the first move only and only if the vindictive.bot is playing as player.1. We can see that the tit.for.tat.bot handles very well when this bot increases its greediness and the vindictiveness. This bot cooperates when the adversary cooperates, but reacts when the other player decides to “defects”
1.3) GREEDY BOT VS VINDICTIVE BOT
Using different thresholds of greedy.level in greedy.bot vs the effect of vindictive.level in a vindictive.bot. A) greedy.level and vindictive.level increased proportionally equal among the two bots. we can see that the greedy.bot when becomes more greedy it outperform the vindictive.bot because the vindictive.bot only triggers its self-defense strategy when provoked, and the probability to take revenge is based on the vindictive.level. B) greedy.level and vindictive.level increased proportionally inverse among the two bots. This means that the most greedy bot will face with the most forgiving bot.
2.1) FIGHT CLUB: ROUND 1 (0.7 to 1.0 in greedy and vindictive.level)
Tournament of all vs all using a range of parameters from 0.7 to 1.0 in greedy and vindictive.level A) Scatter-plot of each strategy. in x-axis is the mean score of the strategy versus all other strategies when it is playing as player.1 and y.axis when is playing as player.2. We can see that the greedy.bot.1.0 (greedy.level=1.0) has the lowest score followed by the vindictive.bots and the tit.for.tat.bot. The bots that gets higher scores are the ones that have a greedy personality, but not “so-greedy” after all, for example the most successful bot only has 0.7 of probability of “defect”. This experiment is in someway biased because the greedy.bots are taking advantage by exploiting the “nice bots” and getting higher scores for that reason. B) This barplot shows how is the effect of the greediness in the greedy.bot, as it becomes more greedy, the scores tend to be lower. The opposite behavior is shown for the vindictive.bot
2.2) FIGHT CLUB: ROUND 2 (0.8 to 1.0 in greedy and vindictive.level)
To avoid that bias, now we show the experiment with more fierce adversaries
Tournament of all vs all using a range of parameters from 0.8 to 1.0 in greedy and vindictive.level A) Scatter-plot of each strategy. in x-axis is the mean score of the strategy versus all other strategies when it is playing as player.1 and y.axis when is playing as player.2. We can see that the greedy.bot.1.0 (greedy.level=1.0) has the lowest score followed by the vindictive.bots and the not-so-greedy bots. Now the bot that wins the contest is the simplest strategy, tit.for.that. B) This barplot shows how is the effect of the greediness in the greedy.bot, as it becomes more greedy, the scores tend to be lower. The opposite behavior is shown for the vindictive.bot. In general we can conclude that the the vindictive.bots performs better than the greedy.bots because they tend to cooperate unless provoked, so, they are nicer bots.
DISCUSSION AND CONCLUSIONS
NICE “BOTS” FINISH FIRST, as Dr Dawkins concluded in his book.
Dr. Dawkins illustrates the four conditions of tit for tat.
Unless provoked, the agent will always cooperate.
If provoked, the agent will retaliate.
The agent is quick to forgive.
The agent must have a good chance of competing against the opponent more than once.
In conclusion, it is better to be a nice human than a greedy one.