Urbanisation Continues – Modeling Regional Apartment Prices in Finland

March 7, 2016

(This article was first published on Juuso's blog on Open Data Science and R, and kindly contributed to R-bloggers)

With all the data science and big data hype going on it’s always nice to see real case examples. At Reaktor we have created Kannattaakokauppa.fi, a probabilistic modeling-based interactive visualisation of regional apartment price trends in Finland. The service shows the predicted price levels and trends for year 2017, and by zooming in to a chosen zip code the price development since 2005 is shown.


Screenshot of Kannattaakokauppa.fi.

The visualisation is based on a hierarchical probabilistic model (implemented with R and Stan) of open data from Statistics Finland. For a thorough description of the model, see the blog post at rOpenGov by my colleague Janne Sinkkonen, who did the main modelling work. All R source code is available in GitHub. More technical details in the end of this post. We published the first version of the visualisation last spring, and now we have updated it with data from year 2015.

One clear finding from our analysis is that apartment prices are increasing faster in regions with high population density. This “urbanisation” trend clearly visible in the visualisation (click ‘Trend2017’), with most of Finland showing reducing prices and only the largest cities showing increasing prices.

Filling holes with statistical modeling

The raw data from Statistics Finland is missing a lot of data, as prices are not reported openly if there are too few transactions per year per zip code. Hence the raw data has a lot of holes. And many of the reported values are also likely to contain notable random variation.

Luckily, with a hierarhical probabilistic we are can predict missing data by generalising information across zip codes to fill the holes. As a result, we get a good overview of the regional apartment price development in Finland, and also more reliable price levels for individual zip codes.

To illustrate the advantages of the modelling approach, I created the following animation showing the raw data side by side with the modelled data:


The animation is made using the gganimate R package (source code).

Data sources and R packages

Apartment price data for the postal codes is from Statistics Finland open data API (see Terms of Use).
Postal code region names, municipalities and population data is from Statistics Finland Paavo – Open data by postal code area. Postal code area map is from Duukkis and licensed under CC BY 4.0.

The data sets are accessed via R packages pxweb and gisfin from the rOpenGov project.

For the final interactive visualisation was created by Janne Aukia with JavaScript. For this, the data was processed in R and written into json files. The spatial data files were first written as geojson file and finally transformed to topojson to reduce loading times.

To leave a comment for the author, please follow the link and comment on their blog: Juuso's blog on Open Data Science and R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.


Mango solutions

RStudio homepage

Zero Inflated Models and Generalized Linear Mixed Models with R

Dommino data lab

Quantide: statistical consulting and training




CRC R books series

Six Sigma Online Training

Contact us if you wish to help support R-bloggers, and place your banner here.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)