**Xi'an's Og » R**, and kindly contributed to R-bloggers)

**A**nother problem generated by X’validated* (on which I spent much too much time!)*: given an unbiased coin that produced *M* heads in the first *M* tosses, what is the expected number of additional tosses needed to get *N* *(N>M)* consecutive heads?

**C**onsider the preliminary question of getting a sequence of *N* heads out of *k* tosses, with probability *1-p(N,k)*. The complementary probability is given by the recurrence formula

Indeed, my reasoning is that the event of no consecutive *N* heads out of *k* tosses can be decomposed according to the first occurrence of a tail out of the first *N* tosses. Conditioning on whether this first tail occurs at the first, second, …, *n*th draw leads to this recurrence relation. As I wanted to make sure, I rand the following R code

#no sequence of length N out of k draws pnk=function(N,k){ if (kand got the following check:

> k=15 > #N=2 > 1-pnk(2,k)-sum(apply(vale[,-1]*vale[,-k],1,max))/10^6 [1] 6.442773e-05 > #N=3 > 1-pnk(3,k)-sum(apply(vale[,-(1:2)]*vale[,-c(1,k)]*vale[,-((k-1):k)],1,max))/10^6 [1] 0.0004090137

Next, the probability of getting the first consecutiveN headsinm≥ Ntosses is

Both first cases are self-explanatory. the third case corresponds to a tail occurring at them−N−1th draw, followed byNheads, and prohibitingNconsecutive heads prior to them−N−1th toss. When checking byTsim=10^7 S=sample(c(0,1),Tsim,rep=TRUE) SS=S[-Tsim]*S[-1] out=NULL i=2 while (i<=length(SS)){ if ((SS[i]==1)&&(SS[i-1]==1)){ out=c(out,i);i=i+1} i=i+1} dif=diff((1:length(SS[-out]))[SS[-out]==1]) trobs=probs=tabulate(dif+(dif==1))/length(dif)[1:20] for (t in 1:20) trobs[t]=qmn(2,t) barplot(probs,col="orange2",ylim=c(-max(probs),max(probs))) barplot(-trobs[1:20],col="wheat",add=TRUE)I however get a discrepancy shown in the above graph for the cases

m=3,4, andN=2, which is be due to the pseudo-clever way I compute the waiting times, removing the extra 1′s… Because the probabilities to wait 3 and 4 times for 2 heads should really be both equal to 1/2³.

Now, the probability to getMheads first andNheads inm≥ Ntosses (and no less) is

The third case is explained by the fact that completions of the first sequence of heads must stop (by a tail) before reaching

Nheads. Hence the conditional probability of waitingmtosses to getNconsecutive heads given the firstMconsecutive heads is

The expected number can then be derived byor

for the number of *additional* steps…

Checking for the smallest values ofMandN, I got a reasonable agreement with the theoretical value of 2^{N+1}-2^{M+1}(established on Cross validated). (For larger values ofMandN, I had to replace the recursive definition of pnk with a matrix computed once for all.)Filed under: R, Statistics, University life Tagged: conditioning, heads and tails, R, recursion

Toleave a commentfor the author, please follow the link and comment on their blog:Xi'an's Og » R.

R-bloggers.com offersdaily e-mail updatesabout R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...