# Cross validated question

February 19, 2012
By

(This article was first published on Xi'an's Og » R, and kindly contributed to R-bloggers)

Consider the preliminary question of getting a sequence of N heads out of k tosses, with probability 1-p(N,k). The complementary probability is given by the recurrence formula $p(N,k) = \begin{cases} 1 &\text{if } k
Indeed, my reasoning is that the event of no consecutive N heads out of k tosses can be decomposed according to the first occurrence of a tail out of the first N tosses. Conditioning on whether this first tail occurs at the first, second, …, nth draw leads to this recurrence relation. As I wanted to make sure, I rand the following R code

#no sequence of length N out of k draws
pnk=function(N,k){
if (k

and got the following check:
> k=15
> #N=2
> 1-pnk(2,k)-sum(apply(vale[,-1]*vale[,-k],1,max))/10^6
 6.442773e-05
> #N=3
> 1-pnk(3,k)-sum(apply(vale[,-(1:2)]*vale[,-c(1,k)]*vale[,-((k-1):k)],1,max))/10^6
 0.0004090137

Next, the probability of getting the first consecutive N heads in m≥ N tosses is $q(N,m) =\begin{cases} 0 &\text{if }m $q(N,m) =\begin{cases} 0 &\text{if }m
Both first cases are self-explanatory. the third case corresponds to a tail occurring at the m−N−1th draw, followed by N heads, and prohibiting N consecutive heads prior to the m−N−1th toss. When checking by
Tsim=10^7
S=sample(c(0,1),Tsim,rep=TRUE)
SS=S[-Tsim]*S[-1]
out=NULL
i=2
while (i<=length(SS)){
if ((SS[i]==1)&&(SS[i-1]==1)){
out=c(out,i);i=i+1}
i=i+1}
dif=diff((1:length(SS[-out]))[SS[-out]==1])
trobs=probs=tabulate(dif+(dif==1))/length(dif)[1:20]
for (t in 1:20) trobs[t]=qmn(2,t)
barplot(probs,col="orange2",ylim=c(-max(probs),max(probs)))

I however get a discrepancy shown in the above graph for the cases m=3,4, and N=2, which is be due to the pseudo-clever way I compute the waiting times, removing the extra 1′s… Because the probabilities to wait 3 and 4 times for 2 heads should really be both equal to 1/2³.
Now, the probability to get M heads first and N heads in m≥ N tosses (and no less) is $r(M,N,m) = \begin{cases} \frac{1}{2^N} &\text{if }m=N\\ 0 &\text{if } N $r(M,N,m) = \begin{cases} \frac{1}{2^N} &\text{if }m=N\\ 0 &\text{if } N
The third case is explained by the fact that completions of the first sequence of heads must stop (by a tail) before reaching N heads. Hence the conditional probability of waiting m tosses to get N consecutive heads given the first M consecutive heads is $s(M,N,m) = \begin{cases} \frac{1}{2^{N-M}} &\text{if }m=N\\ 0 &\text{if } N $s(M,N,m) = \begin{cases} \frac{1}{2^{N-M}} &\text{if }m=N\\ 0 &\text{if } N
The expected number can then be derived by $\mathfrak{E}(M,N)= \sum_{m=N}^\infty m\, s(M,N,m)$ $\mathfrak{E}(M,N)= \sum_{m=N}^\infty m\, s(M,N,m)$
or $\mathfrak{E}(M,N)-M$ $\mathfrak{E}(M,N)-M$
for the number of *additional* steps…
Checking for the smallest values of M and N, I got a reasonable agreement with the theoretical value of 2N+1-2M+1(established on Cross validated). (For larger values of M and N, I had to replace the recursive definition of pnk with a matrix computed once for all.)
Filed under: R, Statistics, University life Tagged: conditioning, heads and tails, R, recursion                var vglnk = { key: '949efb41171ac6ec1bf7f206d57e90b8' };

(function(d, t) {
var s = d.createElement(t); s.type = 'text/javascript'; s.async = true;
var r = d.getElementsByTagName(t); r.parentNode.insertBefore(s, r);
}(document, 'script'));

Related
ShareTweet

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

Tags: conditioning, heads and tails, R, recursion, statistics, University life