(This article was first published on Anything but Rbitrary, and kindly contributed to Rbloggers)
This article was first published on analyze stuff. It has been contributed to Anything but Rbitrary in celebration of its introductory post.
By Max Ghenis
Welcome to analyze stuff! For our first post, I wanted to reflect on the time of year; after all, ‘tis the season for hams and yams, caroling and sledding, and of course gifts! One popular party gift exchange game is the White Elephant, where each person brings a wrapped (typically regifted or otherwise oddball) gift, and then picks one in order with the option of “stealing” another unwrapped gift. Relative to other matching problems, for example, the stable marriage problem or the secretary problem, both of which have elegant defined solutions, strikingly little research has been done on the fairness of this game (with perhaps one exception).
Welcome to analyze stuff! For our first post, I wanted to reflect on the time of year; after all, ‘tis the season for hams and yams, caroling and sledding, and of course gifts! One popular party gift exchange game is the White Elephant, where each person brings a wrapped (typically regifted or otherwise oddball) gift, and then picks one in order with the option of “stealing” another unwrapped gift. Relative to other matching problems, for example, the stable marriage problem or the secretary problem, both of which have elegant defined solutions, strikingly little research has been done on the fairness of this game (with perhaps one exception).
As an operations research guy, my tool of choice for these situations tends to be the discrete event simulation. This is a nice approach for understanding systems that are too complex to model closedform, of which White Elephant is a perfect example. R’s combination of statistical functions, random number generators, and encapsulation makes it an ideal language for performing discrete event simulations.
In this article, I use R to create a function representing a single turn, then a single game, and finally an entire simulation of 200,000 games (full code here). Afterward I analyze the results–both with R and interactive charts from Google Sheets–which brings to light a few things about the Yuletide activity (some may surprise you!).
Sample game outcome Player 1 gets gift 3, 2 gets 6, and so on 
The model
As there are many potential variations on the game, let’s start off with the particular rules I implemented:

Each round consists of a new player joining the game, based on a predetermined order

At each round, the player either steals an unwrapped gift or unwraps one herself

Upon someone else’s gift being stolen, the “stealee” makes the same choice of stealing vs. choosing. A round completes when someone unwraps a gift. While this part is consistent, the following rules are subject to variation:

No gift can be stolen more than once per round

No gift can be stolen more than three times total

The game ends when the final gift is unwrapped
I also made some assumptions about the players’ behavior:

Players will steal each round with probability p = (# stealable gifts) / (# stealable + wrapped gifts)

If players choose to steal, they will steal their favorite gift

Players’ preference for each gift is governed by averaging two uniformly–that is, U(0, 1)–distributed factors:

Each gift’s innate utility (players will partially agree that some gifts are better than others)

Preferences differing randomly across players

Gift utilities are completely unknown prior to unwrapping, i.e. I’m assuming players can’t tell the difference between a wrapped fruitcake and a wrapped Ferrari
In each simulation run, a new utility matrix is generated, and the game’s result is stored as the ultimate set of playergift matches, along with the utilities and ranks each player associates with their gift. I ran 10,000 white elephant simulations for each of twenty different scenarios, representing player counts ranging from 1 to 20 (200,000 simulation runs in total, 2.1M rows for each player outcome). You can see the full simulation result here.
> result
n.players player gift steals utility rank run
1: 1 1 1 0 0.5550705 1 1
2: 1 1 1 0 0.5971866 1 1
3: 1 1 1 0 0.4942946 1 1
4: 1 1 1 0 0.6106313 1 1
5: 1 1 1 0 0.7207114 1 1

2099996: 20 16 10 1 0.7030580 10 20
2099997: 20 17 17 0 0.5422361 9 20
2099998: 20 18 18 1 0.6655989 2 20
2099999: 20 19 20 1 0.7376923 4 20
2100000: 20 20 14 2 0.7768777 7 20
Tools
The simulation utilizes a few of my favorite R packages including data.table (used in many of the summaries below), plyr, reshape2, and parallel. Instead of R, I use Google Sheets to create and embed interactive charts easily (this spreadsheet contains all charts). While R is plenty capable of producing sophisticated visualizations, Google Sheets provides an unparalleled interface for making and sharing interactive charts in a familiar format to any analyst.
Results
Let’s begin the analysis by considering the sixplayer scenario. From here we can plot four metrics against the player order: average utility, average rank, and likelihood to get favorite and least favorite gifts.
current.n < 6
# Note that result is a data.table, enabling these operations
result[n.players == current.n,
list(mean.utility = mean(utility),
mean.rank = mean(rank),
pct.favorite.gift = sum(rank == 1) / .N,
pct.least.favorite.gift = sum(rank == current.n) / .N),
player]
While the first couple players obtain similar utility, the final player does significantly better. Relative to the first five players, they are 3.2% more likely to get their favorite gift, but also (unexpectedly) 0.6% more likely to get their least favorite gift.
Kernel density plots of utility reveal again that the final player has the most rightskewed distribution; their mode is 0.70, while the penultimate player’s is 0.59.
This chart shows yet again how similarly the early players perform, particularly the first and second. The notion that the first player is so disadvantaged is so prevalent that allowing them an extra turn has probably become the most common variation. However, this simulation shows that this may worsen the inequity faced by the second player. This also makes sense intuitively: the second player has no advantage over the first since they see only the first gift, whose expected utility equals that of a random wrapped gift.
Varying the number of players
As one might expect, average utility per player increases with the number of players (recall that each gift has a mean utility of 0.5), along with average number of steals per gift:
While the first player gains utility as the player count increases, it’s only slight relative to the final player’s gains:
Ultimately, adding players both reduces inequality (as defined by the Gini coefficient) and efficiency (as defined by utility per turn taken).
Data powering these charts was generated with the following R code:
library(ineq) # For Gini
n.player.summary <
result[, list(mean.steals=mean(steals),
mean.utility=mean(utility),
mean.utility.first.player=
mean(ifelse(player == 1, utility, NA), na.rm=T),
mean.utility.final.player=
mean(ifelse(player == n.players, utility, NA), na.rm=T),
utility.per.turn=mean(utility / (steals + 1)),
gini=ineq(utility, type="Gini")),
n.players]
Optimizing the player count involves balancing efficiency against total utility and equity (though from personal experience, the more the merrier).
Finally, a linear regression model shows that both the number of players and each player’s turn have statistically significant relationships to utility (R2=0.058), though the order is 24x more important.
summary(lm(utility ~ n.players + player, data=result))
Call:
lm(formula = utility ~ n.players + player, data = result)
Residuals:
Min 1Q Median 3Q Max
0.70304 0.12404 0.01915 0.14726 0.47971
Coefficients:
Estimate Std. Error t value Pr(>t)
(Intercept) 5.147e01 4.156e04 1238.34 <2e16 ***
n.players 4.462e04 3.309e05 13.48 <2e16 ***
player 1.057e02 3.309e05 319.28 <2e16 ***

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.2001 on 2099997 degrees of freedom
Multiple Rsquared: 0.05847, Adjusted Rsquared: 0.05847
Fstatistic: 6.521e+04 on 2 and 2099997 DF, pvalue: < 2.2e1
Closing thoughts
What I found most interesting here is that while larger parties have better outcomes, most of this benefit goes to the later players. The analysis has also spawned new questions:

Does an extra turn from player 1 affect overall utility, and equity of it? Since average utility rose and inequality fell as players were added, one might expect this extra turn to improve outcomes; however, such a rule may further diminish player 2’s already poor fortune.

How about other variations? For example, some games only prohibit immediate stealbacks, while allowing gifts to be stolen multiple times in the same turn. This could reduce the final player’s advantage, since the current rules mean they typically get their first or second choice.

How else could we evaluate fairness? Metrics like average utility and Gini coefficient tell part of the story; other alternatives could include comparisons to stablematch solutions as proposed by GaleShapley, or perhaps an efficiency measure more nuanced than raw utility per turn.

When should players steal? An analysis of optimal stealing strategy could begin with tweaking each player’s stealing probability. However, the more interesting model assumes players try to estimate the utility distribution of all gifts based on those they’ve seen so far. From this distribution, they can weigh the odds that the next gift will surpass their favorite unwrapped one, and decide to steal accordingly. If a player sees ten fruitcakes unwrapped and then a Ferrari, they can be pretty sure the Ferrari is an outlier that won’t be matched in the next gift.
White Elephant’s mechanics ended up more complex and fascinating than initially expected, so I’ll be pursuing some of these questions in a followup post. Stay tuned for that and more in the new year, and happy gifting!
Max
Acknowledgements

Special thanks to Ben Ogorek for numerous suggestions and an extra set of eyes on the code.

Thanks to Mindy Greenberg for a thorough review (in particular, weaning me off my semicolon addiction).

Thanks to Bo Cowgill for introducing me to literature surrounding matching problems.
Resources

R code [GitHub]

Raw simulation result [Google Fusion Tables]

Spreadsheet with charts [Google Sheets]
To leave a comment for the author, please follow the link and comment on their blog: Anything but Rbitrary.
Rbloggers.com offers daily email updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...