Prices of houses in the Netherlands
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
The last couple of days I read a number of times about stabilization in house prices which had been dropping due to the crisis. And you get hit by numbers such as change against Q2 2013 or Q3 2012. These are accompanied by reasons why this or that quarter may be special so changes may be off. To be honest, after years of crisis I for one have no clue how prices have changed overall. Hence this small good news was a good reason to grab some data and look for myself.
Data
The data is obtained from the website of NVM, the largest Dutch organization of real estate agents. Their website contains loads of reports and data on the housing market, most of them in Dutch. Since I wanted analyze data I started with grabbing historical data. These are house prices since 1985. Note that the good news was that the number of transactions seemed to increase, not the prices.
The regions are chosen because they reflect some specific properties. These descriptions are purely my own.
Amsterdam is the financial heart and may be sensitive to the financial markets.
Den Haag (The Hague) is the government and related institutions
Rotterdam the harbor, with logistics, may be sensitive to neighbor Germany
Waterland typical location if you work in Amsterdam but don’t want to live there
Almere another typical location to work in Amsterdam, a new grow city, almost doubled in size since 1995
Zuidwest Drente is the countryside
Ede eo is food valley, the more high tech food companies may be found there
Zeeuwse eilanden is more tourism and fishing
Eindhoven is a region with more high tech industries
The historical data is a .pdf per NVM region, covers 1985-2012. These are a bit strange files. I was unable to copy paste the second page of the files, at least into LibreOffice Calc. Plan B was to convert via pdftotext which can be found as part of xpdf. The resulting .txt file is no beauty either, with loads of data on one line, separated by some keywords, then data every other line.
Data for 2013 is not historical. Q3 is in a small report for each region, the new data, which luckily also contains Q2. Q1 was extracted by getting same report for Q2. It seemed most easy to manually extract these in a table and then read them in R.
The code for reading the data is in the appendix.
Analysis
Raw data
Smoothed data
Forecasting
Conclusion
Appendix: reading data
vect <- function(x) unlist(strsplit(x,' '))
dd <- dir(pattern='nvm.*pdf')
la <- lapply(dd[1:9], function(x) {
#http://www.foolabs.com/xpdf/
system(paste(‘pdftotext’,x))
lin1 <- readLines( gsub('pdf','txt',x))
region <- lin1[3]
blok <- grep('^jaar',lin1,value=TRUE)
jaar <- substr(blok,regexpr('jaar',blok)+5,regexpr('zuiver',blok)-2)
blok <- substring(blok,regexpr('zuiver kwartaal',blok)+16)
zk1 <- substr(blok,1,regexpr('voortschrijdend',blok)-2)
blok <- substring(blok,regexpr('kwartaal',blok)+9)
vk1 <- substr(blok,1,regexpr('jaar',blok)-2)
zk2 <- substring(blok,regexpr('kwartaal',blok)+9)
vk2 <- grep('^voortschrijdend kwartaal',lin1,value=TRUE)[1]
vk2 <- substring(vk2,26,regexpr('jaar',vk2)-2)
part1 <- data.frame(jaar=vect(jaar),
vk1=vect(vk1),
zk1=vect(zk1),
vk2=vect(vk2),
zk2=vect(zk2))
blok <- grep('^periode',lin1,value=TRUE)
jaar <- substr(blok,9,regexpr('zuiver',blok)-2)
blok <- substring(blok,regexpr('zuiver kwartaal',blok)+16)
zk1 <- substr(blok,1,regexpr('voortschrijdend',blok)-2)
blok <- substring(blok,regexpr('kwartaal',blok)+9)
vk1 <- substr(blok,1,regexpr('jaar',blok)-2)
zk2 <- substring(blok,regexpr('kwartaal',blok)+9)
vk2 <- grep('^voortschrijdend kwartaal',lin1,value=TRUE)[2]
vk2 <- substring(vk2,26,regexpr('jaar',vk2)-2)
part2 <- data.frame(jaar=vect(jaar),
vk1=vect(vk1),
zk1=vect(zk1),
vk2=vect(vk2),
zk2=vect(zk2))
parts <- rbind(part1,part2)
parts$region <- region
parts
})
# extracting data for Netherlands from region data
nl <- data.frame(
jaar=la[[1]]$jaar,
zk1=la[[1]]$zk2,
region=’Regio 77 Netherlands’
)
# putting together phase 1
la2 <- lapply(la,function(x) x[,c(1,3,6)])
la2[[10]] <- nl
all <- do.call(rbind,la2)
# current data
new <- read.table(textConnection('
“Regio 12 Zuidwest-Drenthe” 184 189 194
“Regio 31 Waterland” 190 209 201
“Regio 34 Amsterdam” 219 224 228
“Regio 37 Almere” 168 171 161
“Regio 46 Den Haag” 213 215 202
“Regio 49 Rotterdam” 172 181 169
“Regio 55 Ede eo” 214 206 224
“Regio 65 Zeeuwse Eilanden” 184 182 184
“Regio 71 Eindhoven eo” 204 211 210
“Regio 77 Netherlands” 205 207 205
‘), col.names=c(‘region’,’13-01′,’13-02′,’13-03′)
)
new <- reshape(new,
varying=list(names(new[-1])),
v.names=’price’,
timevar=’jaar’,
idvar=’region’,
times=c(’13-1′,’13-2′,’13-3′),
direction=’long’,
)
#combining 2
all$price <- as.numeric(sub(',','.',as.character(all$zk1)))
all <- subset(all,select=-zk1)
all <- rbind(all,new)
# variable coding & conversion
all$year <- as.numeric(substr(all$jaar,1,2))
all$year <- ifelse(all$year<50,all$year+2000,all$year+1900)
all$Quarter <- substr(all$jaar,4,4)
all$time <- all$year+as.numeric(all$Quarter)/4-1/8
all$region <- factor(all$region)
all$regionnum <- as.numeric(substr(all$region,7,9))
levels(all$region) <- substr(levels(all$region),10,100)
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.