**Obscure Analytics » Rstats**, and kindly contributed to R-bloggers)

While I dig into conjugacy and the calculation of Bayesian credibility intervals, I figured it’d be good to put some of my other little rabbit holes up here on the off chance they’re interesting to someone. For some reason I have been in the position of attempting to explain the Monty Hall problem and its curious result to a few people in the last month (me, the reason is me, I am the reason, I randomly bring it up in conversation apropos of nothing).

For anyone who doesn’t know, the Monty Hall problem is a now classic probability thought experiment that goes something like:

You are a contestant on the famous television game show “Lets Make A Deal”. The host of the show, Monty Hall, points towards a wall with three doors, numbered 1, 2 and 3, and says, “Behind two of those doors is your garden variety mountain goat, completely worthless to you, but behind just one of those doors is A NEW CAR!”. He then tells you to pickadooranydoor, after which instead of opening the door you chose, he proceeds to open one of the two remaining doors, revealing a goat behind. He then gives you the chance to either stay on your door and claim the prize behind, or switch to the only remaining door and claim the prize behind that. The question is: should you switch?

If you’re like most people, and you haven’t seen this before in some probability lecture, you’d likely say that it doesn’t matter, that either door is likely to contain the car or the goat. 50/50. But actually, the logical and mathematical answer to the question is that **it is ALWAYS better to switch**.

You have a one-in-three chance of choosing the right door in the first place and so a two-in-three chance of choosing the wrong door. It is more likely that you choose the wrong door initially. If you choose the right door initially, Monty can open either of the other two doors to reveal a goat and if you switch you will lose. If you choose the wrong door initially, Monty only has one door left to open to reveal a goat, as he can’t open your door and the other door reveals the car.

Probabilistically, after Monty opens that door, the door you chose initially still has that same one-in-three chance of containing a car. But now the door that is left has all of that left-over TWO-in-three chance of containing that car because of the initial two-in-three chance that YOU chose the wrong door. So always switch.

The initial reaction of my wife and a few other people to this was something along the lines of: “Idon’tlikethatthatmakesmeangrystoptalking.” To be fair, that is most people’s reaction to my nonsense. Nevertheless, it got me thinking about how best to visualize this problem to make the result more intuitive.

Aside: There is at least one other thing that SHOULD bother you about all this. A goat is ONLY completely worthless to silly fat Americans, right? I mean, yes, most people would value a new car over a goat, but it’s a bit much to equate a GOAT with WORTHLESS. You can literally live off a goat. Drink its milk. Make cheese that people will buy for ten dollars and call ‘chev-RA’ or however you pronounce that. Goat meat is delicious when cooked right. A goat definitely is not worthless.

Okay. So my first attempt was just to code up the simulation (a “Monte Carlo” simulation, as it’s referred to, hence the clever title that has probably been used ten thousand times) in R:

########## Simulation Loop ######### for (i in seq(n)){ ## 1. Randomly place prize behind one of three doors PlacePrize <- c(1,0,0)[sample(1:3,3)] ## 2. Randomly pick one of three doors YouPick <- Doors[sample(1:3,1)] ## 3. Monty either randomly opens one of the two doors left over if ## you happen to pick the correct door or picks the only door left ## if you pick one of two incorrect doors MontyOpens <- ifelse(PlacePrize[Doors==YouPick]==1, Doors[!Doors%in%YouPick][sample(1:2,1)], Doors[(!Doors%in%c(YouPick,Doors[PlacePrize==1]))]) PrizeIsBehind <- Doors[PlacePrize==1] ## 4. If the prize is behind the leftover door, you win if you switch. ## Else you win if you stick on your original choice. WinIfSwitch <- ifelse(PlacePrize[!Doors%in%c(YouPick,MontyOpens)]==1,1,0) Picks <- c(Picks, YouPick) Opens <- c(Opens, MontyOpens) WinningDoor <- c(WinningDoor, PrizeIsBehind) WinsIfSwitch <- c(WinsIfSwitch, WinIfSwitch) ### Write results to data frames PlacedDf[i,] <- PlacePrize PicksDf[i,YouPick] <- 2 OpensDf[i,MontyOpens] <- 3} ########## End Simulation Loop ######### |

The key in the above code is that all the choices, the placement of the prize, your choice of a door, your choice to switch, are random EXCEPT Monty’s choice of which door to open. That choice is conditional on your choice, as if you choose the wrong door, he is constrained to only one of the two remaining. That’s why #3 is not just a simple random function call, it is a conditional statement on the results of a random function.

Plotting the results, first of three trials or runs of the simulation, gives us:

In which, probably confusingly, randomness didn’t go our way and for all three trials, we would have won if we switched doors. That’s the thing about probability, though. It is not that the course of action a probability-based analysis suggests always turns out to be the right one each time an event occurs, but that in the aggregate, given what you know about the situation and the type of uncertainty you are dealing with, the course of action suggested is the right one to take BEFORE you know the outcome.

Also likely to be confusing, as in these particular 10 runs, we got unlucky again, and only 50 percent of the time we were likely to win if we switched.

But as n (number of trials) gets higher and higher, the result becomes more and more obvious, first for 100:

It is obvious from these plots of high numbers of simulations that it is the right choice to switch. I mean, the orange bar is almost exactly twice as large as the purple. Don’t you see? Yeah, my wife was still confused and mildly annoyed as well. This plot shows the RESULT clearly but gives no additional intuition to help understand the WHY. Fail.

I have another plot that I’m slowly tinkering with that I hope will actually demonstrate the why of the thing, because I REALLY want to explain this to my wife for some reason. But for now, here’s my quick and dirty R code to simulate the data and generate the above plots:

#### Monty Hall Monte Carlo #### Rob Mealey library(ggplot2) library(RColorBrewer) library(reshape2) ### Function: Run simulation n times and plot results in stacked bar histograms montyMonte <- function(n,titleSize=7,legendTitle=5,ytextSize=5,xtextSize=5){ Picks <- c() Opens <- c() WinningDoor <- c() WinsIfSwitch <- c() PlacedDf <- matrix(nrow=n, ncol=3) OpensDf <- matrix(nrow=n, ncol=3) PicksDf <- matrix(nrow=n, ncol=3) Doors <- c('Door 1', 'Door 2', 'Door 3') colnames(PlacedDf) <- Doors colnames(PicksDf) <- Doors colnames(OpensDf) <- Doors ########## Simulation Loop ######### for (i in seq(n)){ ## 1. Randomly place prize behind one of three doors PlacePrize <- c(1,0,0)[sample(1:3,3)] ## 2. Randomly pick one of three doors YouPick <- Doors[sample(1:3,1)] ## 3. Monty either randomly opens one of the two doors left over if ## you happen to pick the correct door or picks the only door left ## if you pick one of two incorrect doors MontyOpens <- ifelse(PlacePrize[Doors==YouPick]==1, Doors[!Doors%in%YouPick][sample(1:2,1)], Doors[(!Doors%in%c(YouPick,Doors[PlacePrize==1]))]) PrizeIsBehind <- Doors[PlacePrize==1] ## 4. If the prize is behind the leftover door, you win if you switch. ## Else you win if you stick on your original choice. WinIfSwitch <- ifelse(PlacePrize[!Doors%in%c(YouPick,MontyOpens)]==1,1,0) Picks <- c(Picks, YouPick) Opens <- c(Opens, MontyOpens) WinningDoor <- c(WinningDoor, PrizeIsBehind) WinsIfSwitch <- c(WinsIfSwitch, WinIfSwitch) ### Write results to data frames PlacedDf[i,] <- PlacePrize PicksDf[i,YouPick] <- 2 OpensDf[i,MontyOpens] <- 3} ########## End Simulation Loop ######### WinsIfSwitches <- ifelse(WinsIfSwitch==1, 'Switch Door = Win','Switch Door = Lose') Games <- data.frame(Picks, Opens, WinningDoor, WinsIfSwitches) Wins <- sum(WinsIfSwitch)/n Games <- melt(Games,measure.vars=c('Picks', 'Opens', 'WinningDoor', 'WinsIfSwitches')) Games$variable <- ordered(Games$variable, levels=c('WinsIfSwitches', 'WinningDoor', 'Opens','Picks')) PicksDf[is.na(PicksDf)] <- 0 OpensDf[is.na(OpensDf)] <- 0 ResultsDf <- rbind( data.frame('Type'=rep('Placed',n*3),melt(PlacedDf,measure.vars=Doors)), data.frame('Type'=rep('Picked',n*3),melt(PicksDf,measure.vars=Doors)), data.frame('Type'=rep('Opens',n*3),melt(OpensDf,measure.vars=Doors))) colnames(ResultsDf) <- c('Type','Trial','Door','value') # Plot stacked bar histograms of your picks, monty's opens, winning doors # and win if switch ggplot(Games, aes(x=variable, fill=factor(value))) last_plot() + geom_histogram() last_plot() + scale_x_discrete(labels=rev(c('Your Picks',"Monty's Opens", 'Winning Door','Switch=Win/Lose'))) last_plot() + scale_fill_brewer(type='qual',palette=6) + xlab('') + ylab('') last_plot() + theme_bw() + coord_flip() last_plot() + opts(title = paste('Monty Hall Monte Carlo Total Simulation Results, N = ',n,', Pct Switches Win = ',Wins,sep=''), legend.position='bottom',legend.title=theme_blank()) last_plot() + opts(plot.title = theme_text(size=titleSize), legend.text = theme_text(size=legendTitle), axis.text.y = theme_text(size=ytextSize), axis.text.x = theme_text(size=xtextSize)) ggsave(paste('MontyMonteHistograms',n,'.png',sep=''),width=5, height=3) WinsIfSwitches <- factor(WinsIfSwitches) ResultsDf$lineTypes <- ordered(rep(WinsIfSwitches,3*3), levels=rev(levels(WinsIfSwitches))) ResultsDf$Trial <- ordered(ResultsDf$Trial,levels=rev(seq(nrow(ResultsDf)))) return(ResultsDf)} trialLengths <- c(3,10,100,1000,10000) resultsList <- list() for (i in seq(length(trialLengths))){resultsList[[i]] <- montyMonte(trialLengths[i])} |

Good times.

**leave a comment**for the author, please follow the link and comment on their blog:

**Obscure Analytics » Rstats**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...