From one extreme (0) to another (1): challenge failed, but who cares…

January 9, 2011

(This article was first published on Freakonometrics - Tag - R-english, and kindly contributed to R-bloggers)

Just after arriving in Montréal, at the beginning of September, I
discussed statistics of my blog, and said that it might be possible – or
likely – that by new year’s Eve, over a million page would have been
viewed on my blog (from Google’s counter, here). By the end of October (here) I was very optimistic, but mi-December (here) the challenge was likely to be failed. An indeed, the million page target was hit one week after, on January 8th,

base=read.table("",sep="t",header=TRUE)X1=cumsum(base$nombre)X0=X1base=read.table("",sep="t",header=TRUE)X2=cumsum(base$nombre)X=X1+X2 D0=as.Date("08/11/2008","%d/%m/%Y")D=D0+1:length(X1)plot(D,X1,xlim=c(as.Date("08/06/2010","%d/%m/%Y"),as.Date("08/02/2011","%d/%m/%Y")),ylim=c(800000,1050000))abline(h=1000000,col="red")abline(v=as.Date("01/01/2011","%d/%m/%Y"),col="red")points(D,X,col="blue")

Again, the black points were from the previous blog ( which was transferred to that new one ( this Autumn. So I just sum up the stats to get the blue points.
At each date, I fit an ARIMA, and use it to make forecast the
total number of pages viewed on January 1st, and calculate the
probability to reach a million page viewed at that date (using a
Gaussian ARIMA model). Actually, here, I changed a little bit the
challenge, and asked “what would have been the probability to reach a million page viewed on January 1st, and on January 8th” ?

kt=which(D==as.Date("01/06/2010","%d/%m/%Y"))Xbase=XX=X1+X2P1=P2=rep(NA,(length(X)-kt)+7)for(h in 0:(length(X)-kt+7)){model  <- arima(X[1:(kt+h)],c(7 ,1,7),method="CSS")forecast <- predict(model,200) u=max(D[1:kt+h])+1:300if(min(u)<=as.Date("01/01/2011","%d/%m/%Y")){k=which(u==as.Date("01/01/2011","%d/%m/%Y"))(P1[h+1]=1-pnorm(1000000,forecast$pred[k],forecast$se[k]))}k=which(u==as.Date("08/01/2011","%d/%m/%Y"))(P2[h+1]=1-pnorm(1000000,forecast$pred[k],forecast$se[k]))}

The red curve is the
probability to reach 1 million viewed on January 1st (as done earlier,
using an ARIMA projection). The blue one is
the probability to reach 1 million viewed one week after, on January 8th.

and here is the difference between probabilities,

The flat part at the beginning of November corresponds to the bump that
was observed on the initial graph. But then, the slope was too low, and
in December, the challenge was failed… Obviously, looking at statistics during a blog migration is not a bright idea…

To leave a comment for the author, please follow the link and comment on their blog: Freakonometrics - Tag - R-english. offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Tags: , , , , , , , , ,

Comments are closed.

Search R-bloggers


Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)