# A million ? what are the odds…

**Freakonometrics - Tag - R-english**, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

50 days ago, I published a post, here, on forecasting techniques. I was wondering what could be the probability to have, by the end of this year, one million *pages viewed* (from Google Analytics) on this blog. Well, initially, it was on my blog at the Université de Rennes 1 (http://blogperso.univ-rennes1.fr/arthur.charpentier/), but since I transfered the blog, I had to revise my code. Initially, I had that kind of graphs,

and when I look at the cumulative distribution of the number of pages viewed on January first, I had

while for the distribution of the time I should read this million (the dual problem), I obtained

and I said that I should have around 35% chance to reach the million pages viewed by the end of this year.

Here is the updated graph, with the blog à Université de Rennes 1 (still in **black**) and the one here (in blue, where I add the two blogs together).

Actually, I decided to look at the evolution of the probability to reach the million by New Year’s Eve…

The code looks like that,

> base=read.table(“http://perso.univ-rennes1.fr/arthur.charpentier/million2.csv”,

+ sep=”;”,header=TRUE)

> X2=cumsum(base$nombre)

> X=X1+X2

> kt=which(D==as.Date(“01/06/2010″,”%d/%m/%Y”))

> D0=as.Date(“08/11/2008″,”%d/%m/%Y”)

> D=D0+1:length(X1)

> P=rep(NA,(length(X)-kt)+1)

> for(h in 0:(length(X)-kt)){

+ model <- arima(X[1:(kt+h)],c(7 , # partie AR

+ 1, # partie I

+ 7),method=”CSS”) # partie MA

+ forecast <- predict(model,200)

+ u=max(D[1:kt+h])+1:300

+ k=which(u==as.Date(“01/01/2011″,”%d/%m/%Y”))

+ (P[h+1]=1-pnorm(1000000,forecast$pred[k],forecast$se[k]))

+ }

It has been a bit tricky, since I wanted an automatic fit of the ARIMA process, meaning that I had to assess *a priori* the orders of the ARIMA process. And I had numerical problems, since we got *non stationary AR part*

at least at one period of time considered…. So finally I used here

the CSS method which uses conditional-sum-of-squares to find starting

values in the optimization procedure.

Actually,

if we consider a classical descritption of traders, it looks like I act

as a trader (dealing with millions and forgetting about *real *people): it is the same here, I do not know what a *million*

means, I cannot imagine 250,000 visitors looking at that blog… But I

can still do the maths. Anyway, a million is huge when I start to think

about it… but perhaps I should not… I cannot possibility imagine

that so many people might find interesting my mathematical lucubration^{*}….^{*} initially I was looking for the analogous of “élucubration” in French, meaning “divagation, absurd theory” (the proper translation might be “*rantings*” (here) , “*ravings*” (here) or “*wild imagining*” (everywhere else here or there)). When I asked Google for a possible translation (here), I got “lucubration” which means “composed by night; that which is produced by meditation in retirement“.

Well, it was not initially what I intended to say, but since I usually

work on my blog during the night, when I got awake by one of the girls,

I decided to keep this word…. At least, I learnt something today,

appart for the code mentioned above….

**leave a comment**for the author, please follow the link and comment on their blog:

**Freakonometrics - Tag - R-english**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.