**Juuso's blog on Open Data Science and R**, and kindly contributed to R-bloggers)

With all the data science and big data hype going on it’s always nice to see real case examples. At Reaktor we have created Kannattaakokauppa.fi, a probabilistic modeling-based interactive visualisation of regional apartment price trends in Finland. The service shows the predicted price levels and trends for year 2017, and by zooming in to a chosen zip code the price development since 2005 is shown.

*Screenshot of Kannattaakokauppa.fi.*

The visualisation is based on a hierarchical probabilistic model (implemented with R and Stan) of open data from Statistics Finland. For a thorough description of the model, see the blog post at rOpenGov by my colleague Janne Sinkkonen, who did the main modelling work. All R source code is available in GitHub. More technical details in the end of this post. We published the first version of the visualisation last spring, and now we have updated it with data from year 2015.

One clear finding from our analysis is that apartment prices are increasing faster in regions with high population density. This “urbanisation” trend clearly visible in the visualisation (click ‘Trend2017’), with most of Finland showing reducing prices and only the largest cities showing increasing prices.

## Filling holes with statistical modeling

The raw data from Statistics Finland is missing a lot of data, as prices are not reported openly if there are too few transactions per year per zip code. Hence the raw data has a lot of holes. And many of the reported values are also likely to contain notable random variation.

Luckily, with a hierarhical probabilistic we are can predict missing data by generalising information across zip codes to fill the holes. As a result, we get a good overview of the regional apartment price development in Finland, and also more reliable price levels for individual zip codes.

To illustrate the advantages of the modelling approach, I created the following animation showing the raw data side by side with the modelled data:

The animation is made using the gganimate R package (source code).

## Data sources and R packages

Apartment price data for the postal codes is from Statistics Finland open data API (see Terms of Use).

Postal code region names, municipalities and population data is from Statistics Finland Paavo – Open data by postal code area. Postal code area map is from Duukkis and licensed under CC BY 4.0.

The data sets are accessed via R packages pxweb and gisfin from the rOpenGov project.

For the final interactive visualisation was created by Janne Aukia with JavaScript. For this, the data was processed in R and written into json files. The spatial data files were first written as geojson file and finally transformed to topojson to reduce loading times.

**leave a comment**for the author, please follow the link and comment on their blog:

**Juuso's blog on Open Data Science and R**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...