**Xi'an's Og » R**, and kindly contributed to R-bloggers)

**A**nother problem generated by X’validated* (on which I spent much too much time!)*: given an unbiased coin that produced *M* heads in the first *M* tosses, what is the expected number of additional tosses needed to get *N* *(N>M)* consecutive heads?

**C**onsider the preliminary question of getting a sequence of *N* heads out of *k* tosses, with probability *1-p(N,k)*. The complementary probability is given by the recurrence formula

Indeed, my reasoning is that the event of no consecutive *N* heads out of *k* tosses can be decomposed according to the first occurrence of a tail out of the first *N* tosses. Conditioning on whether this first tail occurs at the first, second, …, *n*th draw leads to this recurrence relation. As I wanted to make sure, I rand the following R code

#no sequence of length N out of k draws pnk=function(N,k){ if (k<N){ p=1} else{ p=0 for (j in 1:N) p=p+pnk(N,k-j)/2^j } return(p) }

and got the following check:

> k=15 > #N=2 > 1-pnk(2,k)-sum(apply(vale[,-1]*vale[,-k],1,max))/10^6 [1] 6.442773e-05 > #N=3 > 1-pnk(3,k)-sum(apply(vale[,-(1:2)]*vale[,-c(1,k)]*vale[,-((k-1):k)],1,max))/10^6 [1] 0.0004090137

**N**ext, the probability of getting the first consecutive *N heads* in *m≥ N* tosses is

Both first cases are self-explanatory. the third case corresponds to a tail occurring at the *m−N−1*th draw, followed by* N* heads, and prohibiting *N* consecutive heads prior to the *m−N−1*th toss. When checking by

Tsim=10^7 S=sample(c(0,1),Tsim,rep=TRUE) SS=S[-Tsim]*S[-1] out=NULL i=2 while (i<=length(SS)){ if ((SS[i]==1)&&(SS[i-1]==1)){ out=c(out,i);i=i+1} i=i+1} dif=diff((1:length(SS[-out]))[SS[-out]==1]) trobs=probs=tabulate(dif+(dif==1))/length(dif)[1:20] for (t in 1:20) trobs[t]=qmn(2,t) barplot(probs,col="orange2",ylim=c(-max(probs),max(probs))) barplot(-trobs[1:20],col="wheat",add=TRUE)

I however get a discrepancy shown in the above graph for the cases *m=3,4*, and *N=2*, which is be due to the pseudo-clever way I compute the waiting times, removing the extra 1′s… Because the probabilities to wait 3 and 4 times for 2 heads should really be both equal to 1/2³.

**N**ow, the probability to get *M* heads first and *N* heads in *m≥ N* tosses (and no less) is

The third case is explained by the fact that completions of the first sequence of heads must stop (by a tail) before reaching *N* heads. Hence the conditional probability of waiting *m* tosses to get *N* consecutive heads given the first *M* consecutive heads is

**T**he expected number can then be derived by

or

for the number of *additional* steps…

**C**hecking for the smallest values of *M* and *N*, I got a reasonable agreement with the theoretical value of 2^{N+1}-2^{M+1}(established on Cross validated). (For larger values of *M* and *N*, I had to replace the recursive definition of pnk with a matrix computed once for all.)

Filed under: R, Statistics, University life Tagged: conditioning, heads and tails, R, recursion

**leave a comment**for the author, please follow the link and comment on their blog:

**Xi'an's Og » R**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...