Another problem generated by X’validated (on which I spent much too much time!): given an unbiased coin that produced M heads in the first M tosses, what is the expected number of additional tosses needed to get N (N>M) consecutive heads?
Consider the preliminary question of getting a sequence of N heads out of k tosses, with probability 1-p(N,k). The complementary probability is given by the recurrence formula
Indeed, my reasoning is that the event of no consecutive N heads out of k tosses can be decomposed according to the first occurrence of a tail out of the first N tosses. Conditioning on whether this first tail occurs at the first, second, …, nth draw leads to this recurrence relation. As I wanted to make sure, I rand the following R code
#no sequence of length N out of k draws
pnk=function(N,k){
if (k<N){
p=1}
else{
p=0
for (j in 1:N)
p=p+pnk(N,k-j)/2^j
}
return(p)
}
and got the following check:
> k=15 > #N=2 > 1-pnk(2,k)-sum(apply(vale[,-1]*vale[,-k],1,max))/10^6 [1] 6.442773e-05 > #N=3 > 1-pnk(3,k)-sum(apply(vale[,-(1:2)]*vale[,-c(1,k)]*vale[,-((k-1):k)],1,max))/10^6 [1] 0.0004090137
Next, the probability of getting the first consecutive N heads in m≥ N tosses is
Both first cases are self-explanatory. the third case corresponds to a tail occurring at the m−N−1th draw, followed by N heads, and prohibiting N consecutive heads prior to the m−N−1th toss. When checking by
Tsim=10^7
S=sample(c(0,1),Tsim,rep=TRUE)
SS=S[-Tsim]*S[-1]
out=NULL
i=2
while (i<=length(SS)){
if ((SS[i]==1)&&(SS[i-1]==1)){
out=c(out,i);i=i+1}
i=i+1}
dif=diff((1:length(SS[-out]))[SS[-out]==1])
trobs=probs=tabulate(dif+(dif==1))/length(dif)[1:20]
for (t in 1:20) trobs[t]=qmn(2,t)
barplot(probs,col="orange2",ylim=c(-max(probs),max(probs)))
barplot(-trobs[1:20],col="wheat",add=TRUE)
I however get a discrepancy shown in the above graph for the cases m=3,4, and N=2, which is be due to the pseudo-clever way I compute the waiting times, removing the extra 1′s… Because the probabilities to wait 3 and 4 times for 2 heads should really be both equal to 1/2³.
Now, the probability to get M heads first and N heads in m≥ N tosses (and no less) is
The third case is explained by the fact that completions of the first sequence of heads must stop (by a tail) before reaching N heads. Hence the conditional probability of waiting m tosses to get N consecutive heads given the first M consecutive heads is
The expected number can then be derived by
or
for the number of *additional* steps…
Checking for the smallest values of M and N, I got a reasonable agreement with the theoretical value of 2N+1-2M+1(established on Cross validated). (For larger values of M and N, I had to replace the recursive definition of pnk with a matrix computed once for all.)
Filed under: R, Statistics, University life Tagged: conditioning, heads and tails, R, recursion
R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series,ecdf, trading) and more...
Zero Inflated Models and Generalized Linear Mixed Models with R.
Zuur, Saveliev, Ieno (2012).
