Cross validated question

February 19, 2012

(This article was first published on Xi'an's Og » R, and kindly contributed to R-bloggers)

Another problem generated by X’validated (on which I spent much too much time!): given an unbiased coin that produced M heads in the first M tosses, what is the expected number of additional tosses needed to get N (N>M) consecutive heads?

Consider the preliminary question of getting a sequence of N heads out of k tosses, with probability 1-p(N,k). The complementary probability is given by the recurrence formula
p(N,k) = \begin{cases} 1 &\text{if } k<N\\  \sum_{m=1}^{N} \frac{1}{2^m}p(N,k-m) &\text{else}\\  \end{cases}
Indeed, my reasoning is that the event of no consecutive N heads out of k tosses can be decomposed according to the first occurrence of a tail out of the first N tosses. Conditioning on whether this first tail occurs at the first, second, …, nth draw leads to this recurrence relation. As I wanted to make sure, I rand the following R code

#no sequence of length N out of k draws
  if (k

and got the following check:

 > k=15
> #N=2
> 1-pnk(2,k)-sum(apply(vale[,-1]*vale[,-k],1,max))/10^6
[1] 6.442773e-05
> #N=3
> 1-pnk(3,k)-sum(apply(vale[,-(1:2)]*vale[,-c(1,k)]*vale[,-((k-1):k)],1,max))/10^6
[1] 0.0004090137

Next, the probability of getting the first consecutive N heads in m≥ N tosses is
q(N,m) =\begin{cases}  0 &\text{if }m<N\\  \frac{1}{2^N} &\text{if }m=N\\  \frac{p(N,m-N-1) }{2^{N+1}} &\text{if } N<m\\  \end{cases}
Both first cases are self-explanatory. the third case corresponds to a tail occurring at the m−N−1th draw, followed by N heads, and prohibiting N consecutive heads prior to the m−N−1th toss. When checking by

while (i<=length(SS)){
  if ((SS[i]==1)&&(SS[i-1]==1)){
for (t in 1:20) trobs[t]=qmn(2,t)

I however get a discrepancy shown in the above graph for the cases m=3,4, and N=2, which is be due to the pseudo-clever way I compute the waiting times, removing the extra 1′s… Because the probabilities to wait 3 and 4 times for 2 heads should really be both equal to 1/2³.

Now, the probability to get M heads first and N heads in m≥ N tosses (and no less) is
r(M,N,m) = \begin{cases}  \frac{1}{2^N} &\text{if }m=N\\  0 &\text{if } N<m<N+M\\  \sum_{r=M+1}^{N}\frac{q(N,m-r)}{2^{r}}&\text{if } m\ge N+M  \end{cases}

The third case is explained by the fact that completions of the first sequence of heads must stop (by a tail) before reaching N heads. Hence the conditional probability of waiting m tosses to get N consecutive heads given the first M consecutive heads is

s(M,N,m) = \begin{cases}  \frac{1}{2^{N-M}} &\text{if }m=N\\  0 &\text{if } N<m<N+M\\  \sum_{r=M+1}^{N}\frac{q(N,m-r)}{2^{r-M}}&\text{if } m\ge N+M  \end{cases}
The expected number can then be derived by

\mathfrak{E}(M,N)= \sum_{m=N}^\infty m\, s(M,N,m)



for the number of *additional* steps…

Checking for the smallest values of M and N, I got a reasonable agreement with the theoretical value of 2N+1-2M+1(established on Cross validated). (For larger values of M and N, I had to replace the recursive definition of pnk with a matrix computed once for all.)

Filed under: R, Statistics, University life Tagged: conditioning, heads and tails, R, recursion

To leave a comment for the author, please follow the link and comment on their blog: Xi'an's Og » R. offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Tags: , , , , ,

Comments are closed.

Search R-bloggers


Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)