A million ? what are the odds…

[This article was first published on Freakonometrics - Tag - R-english, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

50 days ago, I published a post, here, on forecasting techniques. I was wondering what could be the probability to have, by the end of this year, one million pages viewed (from Google Analytics) on this blog. Well, initially, it was on my blog at the Université de Rennes 1 (http://blogperso.univ-rennes1.fr/arthur.charpentier/), but since I transfered the blog, I had to revise my code. Initially, I had that kind of graphs,

and when I look at the cumulative distribution of the number of pages viewed on January first, I had

while for the distribution of the time I should read this million (the dual problem), I obtained

and I said that I should have around 35% chance to reach the million pages viewed by the end of this year.
Here is the updated graph, with the blog à Université de Rennes 1 (still in black) and the one here (in blue, where I add the two blogs together).

Actually, I decided to look at the evolution of the probability to reach the million by New Year’s Eve…

The code looks like that,
> base=read.table(“http://perso.univ-rennes1.fr/arthur.charpentier/million2.csv”,
+ sep=”;”,header=TRUE)
> X2=cumsum(base$nombre)
> X=X1+X2
> kt=which(D==as.Date(“01/06/2010″,”%d/%m/%Y”))
> D0=as.Date(“08/11/2008″,”%d/%m/%Y”)
> D=D0+1:length(X1)
> P=rep(NA,(length(X)-kt)+1)
> for(h in 0:(length(X)-kt)){
+ model  <- arima(X[1:(kt+h)],c(7 ,   # partie AR
+                     1,    # partie I
+                     7),method=”CSS”)   # partie MA
+ forecast <- predict(model,200)
+ u=max(D[1:kt+h])+1:300
+ k=which(u==as.Date(“01/01/2011″,”%d/%m/%Y”))
+ (P[h+1]=1-pnorm(1000000,forecast$pred[k],forecast$se[k]))
+ }
It has been a bit tricky, since I wanted an automatic fit of the ARIMA process, meaning that I had to assess a priori the orders of the ARIMA process. And I had numerical problems, since we got non stationary AR part at least at one period of time considered…. So finally I used here the CSS method which uses conditional-sum-of-squares to find starting values in the optimization procedure.

Actually, if we consider a classical descritption of traders, it looks like I act as a trader (dealing with millions and forgetting about real people): it is the same here, I do not know what a million means, I cannot imagine 250,000 visitors looking at that blog… But I can still do the maths. Anyway, a million is huge when I start to think about it… but perhaps I should not… I cannot possibility imagine that so many people might find interesting my mathematical lucubration*….
* initially I was looking for the analogous of “élucubration” in French, meaning “divagation, absurd theory” (the proper translation might be “rantings” (here) , “ravings” (here) or “wild imagining” (everywhere else here or there)). When I asked Google for a possible translation (here), I got “lucubration” which means “composed by night; that which is produced by meditation in retirement“. Well, it was not initially what I intended to say, but since I usually work on my blog during the night, when I got awake by one of the girls, I decided to keep this word…. At least, I learnt something today, appart for the code mentioned above….

To leave a comment for the author, please follow the link and comment on their blog: Freakonometrics - Tag - R-english.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)