**Freakonometrics - Tag - R-english**, and kindly contributed to R-bloggers)

Just after arriving in Montréal, at the beginning of September, I

discussed statistics of my blog, and said that it might be possible – or

likely – that by new year’s Eve, over a million page would have been

viewed on my blog (from Google’s counter, here). By the end of October (here) I was very optimistic, but mi-December (here) the challenge was likely to be failed. An indeed, the million page target was hit one week after, on January 8th,

base=read.table("http://freakonometrics.blog.free.fr/public/data/million1.csv",sep="t",header=TRUE)X1=cumsum(base$nombre)X0=X1base=read.table("http://freakonometrics.blog.free.fr/public/data/million2.csv",sep="t",header=TRUE)X2=cumsum(base$nombre)X=X1+X2 D0=as.Date("08/11/2008","%d/%m/%Y")D=D0+1:length(X1)plot(D,X1,xlim=c(as.Date("08/06/2010","%d/%m/%Y"),as.Date("08/02/2011","%d/%m/%Y")),ylim=c(800000,1050000))abline(h=1000000,col="red")abline(v=as.Date("01/01/2011","%d/%m/%Y"),col="red")points(D,X,col="blue")

Again, the **black** points were from the previous blog (http://blogperso.univ-rennes1.fr/arthur.charpentier/) which was transferred to that new one (http://freakonometrics.blog.free.fr) this Autumn. So I just sum up the stats to get the blue points.

At each date, I fit an ARIMA, and use it to make forecast the

total number of pages viewed on January 1st, and calculate the

probability to reach a million page viewed at that date (using a

Gaussian ARIMA model). Actually, here, I changed a little bit the

challenge, and asked “what would have been the probability to reach a million page viewed on January 1st, and on January 8th” ?

kt=which(D==as.Date("01/06/2010","%d/%m/%Y"))Xbase=XX=X1+X2P1=P2=rep(NA,(length(X)-kt)+7)for(h in 0:(length(X)-kt+7)){model <- arima(X[1:(kt+h)],c(7 ,1,7),method="CSS")forecast <- predict(model,200) u=max(D[1:kt+h])+1:300if(min(u)<=as.Date("01/01/2011","%d/%m/%Y")){k=which(u==as.Date("01/01/2011","%d/%m/%Y"))(P1[h+1]=1-pnorm(1000000,forecast$pred[k],forecast$se[k]))}k=which(u==as.Date("08/01/2011","%d/%m/%Y"))(P2[h+1]=1-pnorm(1000000,forecast$pred[k],forecast$se[k]))}

The red curve is the

probability to reach 1 million viewed on January 1st (as done earlier,

using an ARIMA projection). The blue one is

the probability to reach 1 million viewed one week after, on January 8th.

and here is the difference between probabilities,

The flat part at the beginning of November corresponds to the bump that

was observed on the initial graph. But then, the slope was too low, and

in December, the challenge was failed… Obviously, looking at statistics during a blog migration is not a bright idea…

**leave a comment**for the author, please follow the link and comment on his blog:

**Freakonometrics - Tag - R-english**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...