Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Last year, I did mention in a post that unit-root tests are dangerous, because they might lead us to strange models. For instance, in a post, I did obtain that the temperature observed in January 2013, in Montréal, might be considered as a random walk process (or at leat an integrated process). The code to extract the data has changed (since the website has been updated), so here, we use

library(RCurl)
library(XML)
options(RCurlOptions = list(useragent = "R"))
HEURE=0:23
extracttemp=function(Y,M,D){
url=paste(
"http://climate.weather.gc.ca/climateData/hourlydata_e.html?timeframe=1&Prov=QC&StationID=5415&Year=",Y,"&Month=",
M,"&Day=",D,sep="")
wp <- getURLContent(url)
doc <- htmlParse(wp, asText = TRUE)
docName(doc) <- url
basejour=data.frame(Year=Y,Month=M,Day=D,
Hour=HEURE,Temp=as.numeric(as.character(data.frame(tmp[2])[,2]))[2:25])
return(basejour)}
B=NULL
for(y in 1955:2013){
for(d in 1:31){
B=rbind(B,extracttemp(y,1,d))}}

Here are all the temperatures observed, and 2013,

plot(B$X,B$Temp,cex=.5,col="light blue",xlab="January, in Montreal",ylab="Temperature (Celsius)")
I=which(B$Year==2013) lines(B$X[I],B$Temp[I],col="red") In the previous post, one test only was used, and one year was considered. I was wondering if this behavior was observed only with temperature of 2013 (or not), and how the other tests (mentioned in a previous post too) were performing. I might need a function, because those tests cannot be used if there is a missing value, even only one. So I did use the value observed one hour before (just to make sure that the tests can be done) correcty=function(Y){ I=which(is.na(Y)) if(length(I)==0){Yc=Y} if(length(I)>0){Yc=Y;for(i in I) Yc[i]=Yc[i-1]} return(Yc) } Now, we can compute the p-values, for all the years, and the three different three (keeping in mind that two test if the series is non-stationary, and one if the series is stationary) DF=matrix(NA,2013-1954,3) library(urca) for(y in 1955:2013){ Z=B$Temp[which(B$Year==y)] Zc=correcty(Z) DF[y-1954,2]=as.numeric(pp.test(Zc)$p.value)
DF[y-1954,1]=as.numeric(kpss.test(Zc)$p.value) DF[y-1954,3]=as.numeric(adf.test(Zc)$p.value)
}

Visually, if red means stationary, and blue means non-stationary, we get

DFP=DF
DFP[,1]=DF[,1]<.05
DFP[,2:3]=DF[,2:3]>.05
library(RColorBrewer)
CL=brewer.pal(6, "RdBu")
plot(0:1,0:1,xlim=c(1950,2015),ylim=c(0,3),axes=FALSE,xlab="",ylab="")
axis(1)
text(1952,.5,"KPSS")
text(1952,1.5,"PP")
for(y in 1955:2013){
for(i in 1:3){
polygon(y+c(-1,-1,1,1)/2.2,i-.5+c(-1,1,1,-1)/2.2,col=CL[1+(DFP[y-1954,i]==1)*5],border=NA)}}

Quite frequently, we conclude that the temperature is a random walk. Which does not make sense (from a physical point of view). But again, it might come from the fact that temperature are stationary, but with some fractional behavior (as suggested in the previous post).