**Freakonometrics » R-english**, and kindly contributed to R-bloggers)

Last year, I did mention in a post that unit-root tests are dangerous, because they might lead us to strange models. For instance, in a post, I did obtain that the temperature observed in January 2013, in Montréal, might be considered as a random walk process (or at leat an integrated process). The code to extract the data has changed (since the website has been updated), so here, we use

library(RCurl) library(XML) options(RCurlOptions = list(useragent = "R")) HEURE=0:23 extracttemp=function(Y,M,D){ url=paste( "http://climate.weather.gc.ca/climateData/hourlydata_e.html?timeframe=1&Prov=QC&StationID=5415&Year=",Y,"&Month=", M,"&Day=",D,sep="") wp <- getURLContent(url) doc <- htmlParse(wp, asText = TRUE) docName(doc) <- url tmp <- readHTMLTable(doc) basejour=data.frame(Year=Y,Month=M,Day=D, Hour=HEURE,Temp=as.numeric(as.character(data.frame(tmp[2])[,2]))[2:25]) return(basejour)} B=NULL for(y in 1955:2013){ for(d in 1:31){ B=rbind(B,extracttemp(y,1,d))}}

Here are all the temperatures observed, and 2013,

plot(B$X,B$Temp,cex=.5,col="light blue",xlab="January, in Montreal",ylab="Temperature (Celsius)") I=which(B$Year==2013) lines(B$X[I],B$Temp[I],col="red")

In the previous post, one test only was used, and one year was considered. I was wondering if this behavior was observed only with temperature of 2013 (or not), and how the other tests (mentioned in a previous post too) were performing.

I might need a function, because those tests cannot be used if there is a missing value, even only one. So I did use the value observed one hour before (just to make sure that the tests can be done)

correcty=function(Y){ I=which(is.na(Y)) if(length(I)==0){Yc=Y} if(length(I)>0){Yc=Y;for(i in I) Yc[i]=Yc[i-1]} return(Yc) }

Now, we can compute the *p*-values, for all the years, and the three different three (keeping in mind that two test if the series is non-stationary, and one if the series is stationary)

DF=matrix(NA,2013-1954,3) library(urca) for(y in 1955:2013){ Z=B$Temp[which(B$Year==y)] Zc=correcty(Z) DF[y-1954,2]=as.numeric(pp.test(Zc)$p.value) DF[y-1954,1]=as.numeric(kpss.test(Zc)$p.value) DF[y-1954,3]=as.numeric(adf.test(Zc)$p.value) }

Visually, if red means stationary, and blue means non-stationary, we get

DFP=DF DFP[,1]=DF[,1]<.05 DFP[,2:3]=DF[,2:3]>.05 library(RColorBrewer) CL=brewer.pal(6, "RdBu") plot(0:1,0:1,xlim=c(1950,2015),ylim=c(0,3),axes=FALSE,xlab="",ylab="") axis(1) text(1952,.5,"KPSS") text(1952,1.5,"PP") text(1952,2.5,"ADF") for(y in 1955:2013){ for(i in 1:3){ polygon(y+c(-1,-1,1,1)/2.2,i-.5+c(-1,1,1,-1)/2.2,col=CL[1+(DFP[y-1954,i]==1)*5],border=NA)}}

Quite frequently, we conclude that the temperature is a random walk. Which does not make sense (from a physical point of view). But again, it might come from the fact that temperature are stationary, but with some fractional behavior (as suggested in the previous post).

**leave a comment**for the author, please follow the link and comment on their blog:

**Freakonometrics » R-english**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...