Introducing the Kernelheaping Package II

July 13, 2018
By

(This article was first published on INWT-Blog-RBloggers, and kindly contributed to R-bloggers)

In the first part of Introducing the Kernelheaping Package I showed how to compute and plot kernel density estimates on rounded or interval censored data using the Kernelheaping package. Now, let’s make a big leap forward to the 2-dimensional case. Interval censoring can be generalised to rectangles or alternatively even arbitrary shapes. That may include counties, zip codes, electoral districts or administrative districts. Standard area-level mapping methods such as chloropleth maps suffer from very different area sizes or odd area shapes which can greatly distort the visual impression. The Kernelheaping package provides a way to convert these area-level data to a smooth point estimate. For the German capital city of Berlin, for example, there exists an open data initiative, where data on e.g. demographics is publicly available.  We first load a dataset on the Berlin population, which can be downloaded from: https://www.statistik-berlin-brandenburg.de/opendata/EWR201512E_Matrix.csv ```r library(dplyr) library(fields) library(ggplot2) library(Kernelheaping) library(maptools) library(RColorBrewer) library(rgdal) gpclibPermit() ``` ```r data <- read.csv2("EWR201512E_Matrix.csv") ``` This dataset contains the numbers of inhabitants in certain age groups for each administrative districts. Afterwards, we load a shapefile with these administrative districts, available from:  https://www.statistik-berlin-brandenburg.de/opendata/RBS_OD_LOR_2015_12.zip ```r berlin <- readOGR("RBS_OD_LOR_2015_12/RBS_OD_LOR_2015_12.shp") ``` ```r berlin <- spTransform(berlin, CRS("+proj=longlat +datum=WGS84")) ``` Now, we form our input data set, which contains the area/polygon centers and the variable of interest, whose density should be calculated. In this case we' like to calculate the spatial density of people between 65 and 80 years of age: ```r dataIn <- lapply([email protected], function(x) [email protected]) %>%    do.call(rbind, .) %>%    cbind(data$E_E65U80) ``` In the next step we calculate the bivariate kernel density with the “dshapebivr” function (this may take some minutes) using the prepared data and the shape file: ```r est <- dshapebivr(data = dataIn, burnin = 5, samples = 10,                    adaptive = FALSE, shapefile = berlin,                    gridsize = 325, boundary = TRUE) ``` To plot the map with "ggplot2", we need to perform some additional data preparation steps: ```r [email protected]$id <- as.character([email protected]$PLR) [email protected]$E_E65U80 <- data$E_E65U80 berlinPoints <- fortify(berlin, region = "id") berlinDf <- left_join(berlinPoints, [email protected], by = "id") kData <- data.frame(expand.grid(long = est$Mestimates$eval.points[[1]],                                 lat = est$Mestimates$eval.points[[2]]),                     Density = as.vector(est$Mestimates$estimate)) %>%    filter(Density > 0) ``` Now, we are able to plot the density together with the administrative districts: ```r ggplot(kData) +   geom_raster(aes(long, lat, fill = Density)) +    ggtitle("Bivariate density of Inhabitants between 65 and 80 years") +   scale_fill_gradientn(colours = c("#FFFFFF", "#5c87c2", "#19224e")) +   geom_path(color = "#000000", data = berlinDf, aes(long, lat, group = group)) +   coord_quickmap() ```

This map gives a much better overall impression of the distribution of older people than a simple choropleth map:  ```r ggplot(berlinDf) +   geom_polygon(aes(x = long, y = lat, fill = E_E65U80, group = id)) +    ggtitle("Number of Inhabitants between 65 and 80 years by district") +   scale_fill_gradientn(colours = c("#FFFFFF", "#5c87c2", "#19224e"), "n") +   geom_path(color = "#000000", data = berlinDf, aes(long, lat, group = group)) +   coord_quickmap() ```

Often, as the case with Berlin we may have large uninhabited areas such as woods or lakes. Furthermore, we may like to compute the proportion of older people compared to the overall population in a spatial setting. The third part of this series shows how you can compute boundary corrected and smooth proportion estimates with the Kernelheaping package.

Further parts of the article series Introducing the Kernelheaping Package:

To leave a comment for the author, please follow the link and comment on their blog: INWT-Blog-RBloggers.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)