Temperatures Series as Random Walks

[This article was first published on Freakonometrics » R-english, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Last year, I did mention in a post that unit-root tests are dangerous, because they might lead us to strange models. For instance, in a post, I did obtain that the temperature observed in January 2013, in Montréal, might be considered as a random walk process (or at leat an integrated process). The code to extract the data has changed (since the website has been updated), so here, we use

library(RCurl)
library(XML)
options(RCurlOptions = list(useragent = "R"))
HEURE=0:23
extracttemp=function(Y,M,D){
url=paste(
"http://climate.weather.gc.ca/climateData/hourlydata_e.html?timeframe=1&Prov=QC&StationID=5415&Year=",Y,"&Month=",
M,"&Day=",D,sep="")
wp <- getURLContent(url)
doc <- htmlParse(wp, asText = TRUE) 
docName(doc) <- url
tmp <- readHTMLTable(doc)
basejour=data.frame(Year=Y,Month=M,Day=D,
Hour=HEURE,Temp=as.numeric(as.character(data.frame(tmp[2])[,2]))[2:25])
return(basejour)}
B=NULL
for(y in 1955:2013){
for(d in 1:31){
B=rbind(B,extracttemp(y,1,d))}}

Here are all the temperatures observed, and 2013,

plot(B$X,B$Temp,cex=.5,col="light blue",xlab="January, in Montreal",ylab="Temperature (Celsius)")
I=which(B$Year==2013)
lines(B$X[I],B$Temp[I],col="red")

In the previous post, one test only was used, and one year was considered. I was wondering if this behavior was observed only with temperature of 2013 (or not), and how the other tests (mentioned in a previous post too) were performing.

I might need a function, because those tests cannot be used if there is a missing value, even only one. So I did use the value observed one hour before (just to make sure that the tests can be done)

correcty=function(Y){
I=which(is.na(Y))
	if(length(I)==0){Yc=Y}
	if(length(I)>0){Yc=Y;for(i in I) Yc[i]=Yc[i-1]}	
return(Yc)
}

Now, we can compute the p-values, for all the years, and the three different three (keeping in mind that two test if the series is non-stationary, and one if the series is stationary)

DF=matrix(NA,2013-1954,3)
library(urca)
for(y in 1955:2013){
Z=B$Temp[which(B$Year==y)]
	Zc=correcty(Z)
	DF[y-1954,2]=as.numeric(pp.test(Zc)$p.value)	
	DF[y-1954,1]=as.numeric(kpss.test(Zc)$p.value)	
	DF[y-1954,3]=as.numeric(adf.test(Zc)$p.value)
}

Visually, if red means stationary, and blue means non-stationary, we get

DFP=DF
DFP[,1]=DF[,1]<.05
DFP[,2:3]=DF[,2:3]>.05
library(RColorBrewer)
CL=brewer.pal(6, "RdBu")
plot(0:1,0:1,xlim=c(1950,2015),ylim=c(0,3),axes=FALSE,xlab="",ylab="")
axis(1)
text(1952,.5,"KPSS")
text(1952,1.5,"PP")
text(1952,2.5,"ADF")
for(y in 1955:2013){
for(i in 1:3){
polygon(y+c(-1,-1,1,1)/2.2,i-.5+c(-1,1,1,-1)/2.2,col=CL[1+(DFP[y-1954,i]==1)*5],border=NA)}}

Quite frequently, we conclude that the temperature is a random walk. Which does not make sense (from a physical point of view). But again, it might come from the fact that temperature are stationary, but with some fractional behavior (as suggested in the previous post).

To leave a comment for the author, please follow the link and comment on their blog: Freakonometrics » R-english.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)