Monty Hall problem, with Thompson sampling
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
We all know the Monty Hall problem. Recently, Jason Rosenhouse published a book on that topic (entitled The Monty Hall Problem, The Remarkable Story of Math’s Most Contentious Brain Teaser). The game is more or less described by the following question
Suppose you’re on a game show, and you’re given the choice of three doors: Behind one door is a car; behind the others, goats. You pick a door, say No. 1, and the host, who knows what’s behind the doors, opens another door, say No. 3, which has a goat. He then says to you, “Do you want to pick door No. 2?” Is it to your advantage to switch your choice?
While I was preparing some slides for a lecture on Bayesian modeling and thinking, I wanted to find an illustration of what is sometimes called the Bayesian brain, that can be related to updates of beliefs, when we experience. And I was looking for examples of Thompson sampling. And actually, it is possible to learn that switching is the optimal strategy, in the Monty Hall problem, just by playing sequentially the game, and learning from previous strategies. The following code is used, to choose the door with the price (the car), and the one we first select
set.seed(1) n = 5000 listdoor = matrix(1:3,3,n) door = listdoor win = sample(1:3,size=n,replace=TRUE) pick1 = sample(1:3,size=n,replace=TRUE)
Then, the presenter picks one, that is neither the car, nor the one we chose initially. The following trick can be used, to get the list of available choices
door[win+(0:(n-1))*3] = NA door[,1:10] [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [1,] NA NA NA 1 NA NA 1 NA 1 NA [2,] 2 2 NA NA 2 2 2 NA NA 2 [3,] 3 NA 3 3 NA NA NA 3 NA NA door[pick1+(0:(n-1))*3] = NA door[,1:10] [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [1,] NA NA NA 1 NA NA 1 NA 1 NA [2,] 2 2 NA NA 2 2 2 NA NA 2 [3,] 3 NA 3 3 NA NA NA 3 NA NA
Then, the presenter picks one
presenter = apply(door,2, function(x) sample(x[!is.na(x)],size=1)) > presenter[win != pick1] = apply(door,2,function(x) x[!is.na(x)])[win != pick1] presenter = unlist(presenter) presenter[1:10] [1] 3 2 3 1 2 2 2 3 1 2
Now, let us consider the Monty Hall problem. We have two possible strategies. The first one is to keep the door we chose, initially
pick2a = pick1 gaina = (pick2a==win) mean(gaina) [1] 0.3392
As expected, on average, we win with (about) 1 chance out of 3. The second one is to (always) pick the other door (the one left). The code is close to the one we used before
door = listdoor door[pick1+(0:(n-1))*3] = NA door[presenter+(0:(n-1))*3] = NA pick2b = apply(door,2,function(x) x[!is.na(x)]) gainb = (pick2b==win) mean(gainb) [1] 0.6608
If you know Monty Hall problem the probability to win is now 2 chance out of 3 (which is what the maths tells us). That is what we have with simulations.
Now, what if we don’t know how to do the maths, and we don’t want to compute it? We can use Thompson sampling to explore, and exploit. In a general context, we have to choose among On a le choix entre \(K\) alternatives (here \(K=2\), since we can either keep our initial choice, or pick the other one), and the output is \(\boldsymbol{X}=(X_1,\cdots, X_K)\), where \(X_k\sim\mathcal{B}(\theta_k)\), but \(\theta_k\) is unknow, and we will play the game, and learn. From previous computations, we know that \(\theta_1=1/3\) while \(\theta_2=2/3\).
We use some prior distribution, \(\theta_k\sim\mathcal{B}eta(\alpha_k,\beta_k)\), since the Beta distribution is the conjugate of the Bernoulli. At time \(t\), we draw \(K\) (independent) Beta variables \(B_k\sim\mathcal{B}eta(\alpha_k,\beta_k)\), and pick \(k^\star = \displaystyle{\underset{k=1,\cdots,K}{\text{argmax}}\{B_k\}}\). Here the code will be
set.seed(2) X = cbind(pick2a == win,pick2b == win)*1 AB1 = AB2 = tirage = matrix(NA,n,2) choix = rep(NA,n) k=1 AB1[k,] = AB2[k,] = c(1,1) for(k in 1:(n-1)){ tirage[k,] = c(rbeta(1,AB1[k,1],AB1[k,2]), rbeta(1,AB2[k,1],AB2[k,2])) choix[k] = which.max(tirage[k,]) if(choix[k] == 1){ AB1[k+1,] = AB1[k,] + c(X[k,1],1-X[k,1]) AB2[k+1,] = AB2[k,] } if(choix[k] == 2){ AB1[k+1,] = AB1[k,] AB2[k+1,] = AB2[k,] + c(X[k,2],1-X[k,2]) }}
Before showing some graphs, let us check that indeed, we select more the second strategy (which is here to select the other door)
AB1[n,] [1] 5 13 AB2[n,] [1] 3292 1693
Indeed, since the average of a Beta distribution, \(\mathcal{B}eta(\alpha,\beta)\) is \(\alpha/(\alpha+\beta)\)
AB2[n,1]/(sum(AB2[n,])) [1] 0.6603811
i.e. the probability to win, with this second strategy is about \(2/3\) (as obtained previously). We can visualize this on the animation below, with, in red the first strategy (keep your initial choice), in green the second one (select the other door), 0 and 1 respectively if we win, or not. Then we can visualize the evolution of \(\alpha_2\) and \(\beta_2\) on topc, and \(\alpha_1\) and \(\beta_1\) below (the index is time \(t\)). Finallly, we have the two variables \(B_1\) and \(B_2\) drawn,
Of course, another simulation would have given different \(B_1\)‘s and \(B_2\)‘s, but finally, we learn that the second strategy is better, and we learn it quite fast…
Here is another one (just to confirm)
So clearly, even if we don’t know which is the optimal strategy (keep our initial choice, or switch), a player who played that game about 30 times should be able to understand that switching should be a better strategy.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.