Flip a fair coin 4x. Probability of H following H is 40%???

[This article was first published on Econometrics By Simulation, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

A recent working paper has come out arguing for the existence of Hot Hands (in basketball), a concept psychologists had dismissed decades ago. Hot hands is where a player is thought to have a higher likelihood of scoring the next basket if the last three baskets where shot successfully. (In NBA Jam, that is when you hands caught on fire).

The first paragraph of the paper reads, “Jack takes a coin from his pocket and decides that he will flip it 4 times in a row, writing down the outcome of each flip on a scrap of paper. After he is done flipping, he will look at the flips that immediately followed an outcome of heads, and compute the relative frequency of heads on those flips. Because the coin is fair, Jack of course expects this empirical probability of heads to be equal to the true probability of flipping a heads: 0.5. Shockingly, Jack is wrong. If he were to sample one million fair coins and flip each coin 4 times, observing the conditional relative frequency for each coin, on average the relative frequency would be approximately 0.4.”

In other words $$P(H_t|H_{t-1}) \ne .5$$ even though $$P(H)=.5$$

If you are anything like me, you will say, “WTF, that can’t be true!”

Before getting any further into the logic of the paper, let us do some simulations.

First off, could this be the result of looking only after heads values?

That is, perhaps by selecting only heads values we are reducing the number of available heads.

But, no, this does not make sense!

# x is a binary random variable drawn from 1000^2 draws with a .5 probability  of success.

x <- rbinom(10^6,1,.5)  mean(x[length(x)][x[1]==1])

# Pretty much as close as you can get to 50% as possible   

# To make this feel more concrete for me I ended up flipping a coin 252 times 
# with the following results (1=heads)

myx <- c(1,1,0,1,1,1,1,1,0,1,1,1,0,1,0,1,1,1,1,1,1,1,0,0,0,1,0,1, 1,1,1,0,1,0,1,1,0,1,1,1,1,0,1,0,1,0,0,0,1,0,1,1,0,1,1,0, 0,1,1,0,0,0,0,1,1,0,1,0,0,1,0,1,0,1,0,1,0,1,0,0,1,1,0,0, 1,0,0,1,0,0,1,0,0,0,0,0,0,1,0,1,1,0,1,0,1,1,1,0,0,0,1,0, 1,0,1,1,0,1,1,1,0,0,0,1,1,1,0,0,0,1,0,1,0,1,0,1,1,1,0,1, 0,1,1,1,1,0,0,1,0,0,0,0,1,0,1,1,1,0,1,1,1,0,1,0,1,0,0,1, 0,0,1,1,0,1,1,0,0,1,0,1,1,0,1,0,1,0,0,0,0,1,0,0,1,0,0,0, 0,1,0,0,0,0,0,1,0,0,1,1,0,1,0,1,1,0,1,1,1,1,0,1,0,0,1,0, 0,1,0,0,1,1,1,1,0,1,1,1,0,0,0,0,0,1,1,1,1,0,1,1,1,0,1,0)  # We can see we drew 131 heads or 51.9% mean(myx);sum(myx)  # This is very close to 50%. What is the likelihood of heads coming 

# up this many times on a fair coin?
plot(100:152,pbinom(100:152, 252, .5), type='l', lwd=3, 
     main="CDF of binomial(252,.5)", 
     xlab="# of Heads")
abline(v=sum(myx), lwd=3, lty=2) 
# The likelihood of gettings 131 or higher given the coin is fair is 
# about 24.4%
1-pbinom(sum(myx), 252, .5)
# I am fairly confident therefore that the coin is therefore fair. 
# However, it could be somewhat unfair and still achieve this outcome 
# without straining the bounds of probability. 
# But this is not the point.
# Now let's look at the claim. The probability of observing a heads 
# after tossing a heads is argued to be .4.
# So let's imagine a million fair coins sequences of length 4. Say set Y.
# This set will be composed of the outcomes of 10^6 probabilities 
# where those probabilities are the likelihood that 
# the coin value that follows a heads is a head.
# I write function nheads for this purpose
nhead <- function(flips=4, N=10^6, prev=1)  {
  Y <- rep(NA,N)
  for (i in 1:N) {
    # Draw four binomial draws
    x <- rbinom(flips,1,.5)
    # Now keep only draws after any heads
    Y[i] <- mean(x[-length(x)][x[-1]==prev]) 
# Now let's take the average across the coins
mean(nhead(4,10^6), na.rm=TRUE)
# Damn, sure enough! The expected number of heads is 40.8%
# Let's see if the empirical results adhere. We can think of the 
# 252 previous coinflips as 63 four coin flip trials.
myset <- matrix(myx, 4)
myY <- Y <- rep(NA,63)
# Now let's calculate the likelihood of heads after the first heads in each set
for (i in 1:ncol(myset)) myY[i] <- mean(myset[-4,i][myset[-1,i]==1])
# Pooling across sets
mean(myY, na.rm=TRUE)
# We get 34.2% which could be interpretted as an unluckily low deviation 
# from 50% except it is surprisingly close to 40%.
# That is 19.5 heads 
sum(myY, na.rm = TRUE)
# However the total number of trials that returned a result is 57
# If we believed that the true expected number of heads was .5 then the 
# likelihood of the 19 heads or less appearing is less than 1%.
pbinom(sum(myY, na.rm = TRUE), sum(!is.na(myY)), .5)
# This seems compelling and surprising evidence before even looking at 
# the arguments presented in the paper.
# Looking at the paper, Table 1 is quite convincing. It is organized 
# by # of Heads.
#    Heads            Sequence        P(H|H)
#1   0                TTTT            -
#2   1                TTTH            -
#3   1                TTHT            0
#4   1                THTT            0
#5   1                HTTT            0
#6   2                TTHH            1
#7   2                THTH            0
#8   2                THHT            1/2
#9   2                HTTH            0
#10  2                HTHT            0
#11  2                HHTT            1/2
#12  3                THHH            1
#13  3                HTHH            1/2
#14  3                HHTH            1/2
#15  3                HHHT            2/3
#16  4                HHHH            1
# To come up with the P(H|H) we take the average of the feasible outcomes:
# Sure enough! 40.47
# So what is driving this? Selection bias certainly! But from where?
# Is it the two values that cannot be computed because no heads are generated?
# Even if we specify those as heads this does not fix the probabilities.
# 48.9
# So it must be a feature of the series since we know that 
# if we sample 6 million then we are spot on (near) 1/2.
# Let's see what we can discover
I <- 60
Y <- rep(NA, I-1)
for (i in 2:I) Y[i-1] <- mean(nhead(i,10^5), na.rm=TRUE)
plot(2:I, Y, cex=1, lwd=2, 
     xlab="# of coins")
abline(h=.5, lwd=3) 
# This graph can be quite disorienting. 

It shows that only in the case of flipping two coins is the probability of observing a head after the first head  equal to .5 (well .5 with measurement error).

There is still a significant divergence from .5 even in the case of 60 coints being flipped!

This is so non-intuative as to be downright disorienting.

Does this mean that if someone is flipping four coins and one of them pop up heads you can bet them $10 that the next result is a tail and feel confident the odds are in your favor?

We can see this by looking at our probability table with each sequence arranged by the first head up to that point.

The answer is not quite. I have added up the likelihood of seeing a heads after the previous heads that appears in both sequences (H|1H, H|2H, H|3H).

Heads        Sequence        P(H|H)     H|1H   H|2H   H|3H
  0              TTTT           -
  1              TTTH           -
  2              TTHH           1        1
  1              TTHT           0        0

  2              THTH           0        0
  1              THTT           0        0
  3              THHH           1        1      1
  2              THHT           1/2      1      0

  1              HTTT           0        0
  2              HTTH           0        0
  2              HTHT           0        0      0
  3              HTHH           1/2      0      1
  3              HHTH           1/2      1      0
  2              HHTT           1/2      1      0
  3              HHHT           2/3      1      1      0
  4              HHHH           1        1      1      1 

Thus we can see that according to this, no matter that the sequence the likelihood of the next flip being a head after the first is .5. That is, equal number of heads as tails.

How about the second head (H|1H)? Looking over the table we can see the same thing. As well as for the 3rd head.

From this we can see that in part our intuition is correct. Knowing the previous outcome in the sequence does not provide information on future outcomes. THANK THE GODS!

To get some insight into what is happening, you have to think about how we are scoring each sequence. After each flip of a head we score either 0 or 1. If this is the end of a sequence then we stop there and things work fine: TH (-), TT (-), HT (0), HH (1) -> P(H|H)=1/2.

If it is not the end of a sequence let's see what happens.
THT (0), THH (1), TTT (-), TTH (-), {this part looks okay so far}

HTT (0), HTH (0), HHT ((1+0)/2=1/2), HHH (1).

Now we can begin to see what is happening with the coin flip.

The previously unscorable series TH now becomes scorable. No problem.

The unscorable series TT remanes unscorable.

And the zero scored series HT remains zero scoring!

To review, we now have a previous sequence TH which was unscoreable and now has a probability of Heads of .5 (confirming our expectations). We had a sequence TT which was discarded and remains discarded. And we had a sequence HT which was scored 0 and remains scored 0.

So all of the action must be happening in the HH part of the series!

And we can now think of what is happening. The problem is in the averaging across heads within a series. In this case, by flipping HH this means that the next flip must be evaluated. With HT the next flip does not matter, the sequence will be scored 0 no matter what.

However, with HH, we know that the next flip can be either H or T which means HHT (scored (1+0)/2=.5) and HHH (scored (1+1)/2=1) and $$ mean(H|HH)=.75$$. Thus effectively all other probabilities in the sequence remain the same except what was previously a P(H|HH)=1 is now a P(H|HH)=-.75.

But why? Well, there is a selection issue going on. In the case of HH we are asking what the next score is and rounding our previous positive result to match that. But in the case of say HT we not asking what our next result is and rounding. Thus an isolated few negative outcomes are kept while a few positive outcomes are downgraded resulting in an overall selection effect which reduces the likelihood of seeing a head overall after seeing a previous head.

Now that we can see why this is happening, why does it ever go away? The truth is, that it never goes away. The deviation from .5 gets really small as the sample size gets large. This is because the selective scoring (last value scored) is only happening on a smaller and smaller portion of the series.

I am tempted to say something about the larger implications on series analysis. However, I don't really know how well this sampling issue is known to time series people. I do not recall hearing of it in my meager studies of the matter, but frankly that could mean very little.

To leave a comment for the author, please follow the link and comment on their blog: Econometrics By Simulation.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)