Prices of houses in the Netherlands

[This article was first published on Wiekvoet, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The last couple of days I read a number of times about stabilization in house prices which had been dropping due to the crisis. And you get hit by numbers such as change against Q2 2013 or Q3 2012. These are accompanied by reasons why this or that quarter may be special so changes may be off. To be honest, after years of crisis I for one have no clue how prices have changed overall. Hence this small good news was a good reason to grab some data and look for myself.

Data

The data is obtained from the website of NVM, the largest Dutch organization of real estate agents. Their website contains loads of reports and data on the housing market, most of them in Dutch. Since I wanted analyze data I started with grabbing historical data. These are house prices since 1985. Note that the good news was that the number of transactions seemed to increase, not the prices.

The regions are chosen because they reflect some specific properties. These descriptions are purely my own.
Amsterdam is the financial heart and may be sensitive to the financial markets.
Den Haag (The Hague) is the government and related institutions
Rotterdam the harbor, with logistics, may be sensitive to neighbor Germany
Waterland typical location if you work in Amsterdam but don’t want to live there
Almere another typical location to work in Amsterdam, a new grow city, almost doubled in size since 1995
Zuidwest Drente is the countryside
Ede eo is food valley, the more high tech food companies may be found there
Zeeuwse eilanden is more tourism and fishing
Eindhoven is a region with more high tech industries

The historical data is a .pdf per NVM region, covers 1985-2012. These are a bit strange files. I was unable to copy paste the second page of the files, at least into LibreOffice Calc. Plan B was to convert via pdftotext which can be found as part of xpdf. The resulting .txt file is no beauty either, with loads of data on one line, separated by some keywords, then data every other line.
Data for 2013 is not historical. Q3 is in a small report for each region, the new data, which luckily also contains Q2. Q1 was extracted by getting same report for Q2. It seemed most easy to manually extract these in a table and then read them in R.
The code for reading the data is in the appendix.

Analysis

Raw data 

The first plot contains all data, on a logarithmic scale for price. From this plot it seems the prices have increased gigantic. From just over €50000 to €250000 in 30 years. Its also seems there is a linear phase (hence exponential growth) from say 1985 to 1995 followed by an even faster increase till 2002, which flattened till 2007 after which the decrease set in. A long term interpretation might be that the last couple of years prices corrected after the crazy times around the turn of the century.

library(ggplot2)
p <- ggplot(all, aes(x=time, y=price))
p + geom_line() +
    scale_y_log10(limits=c(50,300),breaks=seq(50,300,by=50)) + 
    scale_x_continuous(breaks=seq(1990,2010,by=10),’Year’) +
    facet_wrap(~region)

Smoothed data

I love my smoothers, so using them is almost second nature. I also took the opportunity to restrict myself to the year 2000 to 2013 and moved to a linear scale. These plots do suggest that Amsterdam, Den Haag and Eindhoven found the way up again, but there are just as many regions which keep on dropping in price.
p + stat_smooth(method=’loess’,span=.2) +
    scale_y_continuous(limits=c(130,300),
        ‘Price (thousands Euros)’) + 
    scale_x_continuous(limits=c(2000,2013.8),’Year’) +
    facet_wrap(~region)

Forecasting

I have seen many a blog with forecast, all the more reason to try to use it. It was easy to use. However, I have some fears for anything ‘auto’. It hides a lot. Further analysis (not shown) showed many competing models and models which could not be estimated. Some of these models are close in aicc, yet predict differently. However, time series has been a bit too long ago to really believe I can do better within a reasonable time. Good thing about ‘auto’ is my personal desire to obtain certain effects is also out of the picture.
The plots show an increases in Den Haag, maybe in Amsterdam. However, the intervals are rather wide.

library(forecast)
par(mfrow=c(3,4),mar=c(1, 2.5, 2.5, .1) + 0.1)
sapply(unique(all$regionnum),function(x) {
      regio <- all[all$regionnum==x,]
      regio <- regio[order(regio$time),]
      top <-ts(regio$price,start=c(min(regio$year),1),
          end=c(max(regio$year),3),
          deltat=0.25)
      fit <- auto.arima(top,stepwise=FALSE,approximation=FALSE)
      LH.pred<-forecast(fit,n.ahead=4)
      plot(LH.pred,xlim=c(2000,2015),
          ylim=c(min(min(LH.pred$lower),min(regio$price[regio$year>2000])),
              max(max(LH.pred$upper),max(regio$price[regio$year>2000]))),
          main=regio$region[1])
    })

Conclusion

The prices indicate that we may indeed be close to the bottom. We are getting close to the longer term trend. However, just as they might swing over the long term trend, so they might swing under. We do see some prices bottoming out. Unfortunately, by the time we know for sure, the question if Q3 2013 is the point where it stopped getting worse may be a year or more away, at which point it is not so much of practical importance.

Appendix: reading data

vect <- function(x)   unlist(strsplit(x,' '))
dd <- dir(pattern='nvm.*pdf')
la <- lapply(dd[1:9], function(x) {
      #http://www.foolabs.com/xpdf/
      system(paste(‘pdftotext’,x))
      lin1 <- readLines(  gsub('pdf','txt',x))
      region <- lin1[3]
      blok <- grep('^jaar',lin1,value=TRUE)
      jaar <- substr(blok,regexpr('jaar',blok)+5,regexpr('zuiver',blok)-2)
      blok <- substring(blok,regexpr('zuiver kwartaal',blok)+16)
      zk1 <- substr(blok,1,regexpr('voortschrijdend',blok)-2)
      blok <- substring(blok,regexpr('kwartaal',blok)+9)
      vk1 <- substr(blok,1,regexpr('jaar',blok)-2)
      zk2 <- substring(blok,regexpr('kwartaal',blok)+9)
      vk2 <- grep('^voortschrijdend kwartaal',lin1,value=TRUE)[1]
      vk2 <- substring(vk2,26,regexpr('jaar',vk2)-2)
      
      part1 <- data.frame(jaar=vect(jaar),
          vk1=vect(vk1),
          zk1=vect(zk1),
          vk2=vect(vk2),
          zk2=vect(zk2))
      
      blok <- grep('^periode',lin1,value=TRUE)
      jaar <- substr(blok,9,regexpr('zuiver',blok)-2)
      blok <- substring(blok,regexpr('zuiver kwartaal',blok)+16)
      zk1 <- substr(blok,1,regexpr('voortschrijdend',blok)-2)
      blok <- substring(blok,regexpr('kwartaal',blok)+9)
      vk1 <- substr(blok,1,regexpr('jaar',blok)-2)
      zk2 <- substring(blok,regexpr('kwartaal',blok)+9)
      vk2 <- grep('^voortschrijdend kwartaal',lin1,value=TRUE)[2]
      vk2 <- substring(vk2,26,regexpr('jaar',vk2)-2)
      part2 <- data.frame(jaar=vect(jaar),
          vk1=vect(vk1),
          zk1=vect(zk1),
          vk2=vect(vk2),
          zk2=vect(zk2))
      parts <- rbind(part1,part2)
      parts$region <- region
      parts
    })

# extracting data for Netherlands from region data
nl <- data.frame(
    jaar=la[[1]]$jaar,
    zk1=la[[1]]$zk2,
    region=’Regio 77 Netherlands’
)
# putting together phase 1 

la2 <- lapply(la,function(x) x[,c(1,3,6)]) 
la2[[10]] <- nl
all <- do.call(rbind,la2)

# current data
new <- read.table(textConnection('
“Regio 12 Zuidwest-Drenthe” 184 189 194
“Regio 31 Waterland” 190 209 201
“Regio 34 Amsterdam” 219 224 228
“Regio 37 Almere” 168 171 161
“Regio 46 Den Haag” 213 215 202
“Regio 49 Rotterdam” 172 181 169
“Regio 55 Ede eo” 214 206 224
“Regio 65 Zeeuwse Eilanden” 184 182 184
“Regio 71 Eindhoven eo” 204 211 210
“Regio 77 Netherlands” 205 207 205
‘), col.names=c(‘region’,’13-01′,’13-02′,’13-03′)
)

new <- reshape(new,
    varying=list(names(new[-1])),
    v.names=’price’,
    timevar=’jaar’, 
    idvar=’region’,
    times=c(’13-1′,’13-2′,’13-3′),
    direction=’long’,
)
#combining 2
all$price <- as.numeric(sub(',','.',as.character(all$zk1)))
all <- subset(all,select=-zk1)
all <- rbind(all,new)
# variable coding & conversion
all$year <- as.numeric(substr(all$jaar,1,2))
all$year <- ifelse(all$year<50,all$year+2000,all$year+1900)
all$Quarter <- substr(all$jaar,4,4)
all$time <- all$year+as.numeric(all$Quarter)/4-1/8
all$region <- factor(all$region)
all$regionnum <- as.numeric(substr(all$region,7,9))
levels(all$region) <- substr(levels(all$region),10,100)

To leave a comment for the author, please follow the link and comment on their blog: Wiekvoet.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)