Multivariate analysis of death rate on the map of Europe

January 11, 2015
By

(This article was first published on Wiekvoet, and kindly contributed to R-bloggers)

Eurostat has information on death rates by cause and NUTS 2 region. I am trying to get this visually displayed on the map. To get there I map all causes to three dimensions via a principal components analysis. These three dimensions are subsequently translated in RGB colors and placed in the map of Europe.

Data

Death rates are from Eurostat table causes of death. I took crude death rate. On that website, from default setup I removed gender and added causes. Then I transposed causes to rows and regions to columns so I would not be bothered by translation of invalid column names. I exported those as .xls, but got only part of the regions. My second attempt was export as .csv. In general I prefer to take .xls from Eurostat as they separate thousands via a ‘,’, however, incomplete was not acceptable, so the .csv is used.
For the mapping the scipt used is based on rpubs: Mapping Data from Eurostat using R. However, some changes were needed as Eurostat reorganized there website. Another adaptation is that I removed all Frances’ overseas parts. These parts gave too much whitespace for my taste.

Result

The resulting map is nice to see, there is certainly a structure. Regions close to each other have similar colors hence similar death rates. However color interpretation itself is difficult. Most important of all, these are comparisons between regions. 

To explain the colors I did something similar as for the map itself. In order to avoid crowding the figure, only forty of the causes are displayed.

Code

library(“rgdal”)
library(“RColorBrewer”)
library(“GISTools”)
library(“classInt”)
library(dplyr)

temp <- tempfile(fileext = “.zip”)
# now download the zip file from its location on the Eurostat website and
# put it into the temp object
# new Eurostat website
# old: http://epp.eurostat.ec.europa.eu
# new: http://ec.europa.eu/eurostat
download.file(“http://ec.europa.eu/eurostat/cache/GISCO/geodatafiles/NUTS_2010_60M_SH.zip”, 
    temp)
# now unzip the boundary data
unzip(temp)

EU_NUTS <- readOGR(dsn = “./NUTS_2010_60M_SH/data”, layer = “NUTS_RG_60M_2010”)
ToRemove <- [email protected]$STAT_LEVL!=2 | grepl(‘FR9’,[email protected]$NUTS_ID)
EUN <- EU_NUTS[!ToRemove,]

###
dir()
r1 <- read.csv(‘hlth_cd_acdr2_1_Data.csv’)
r1$Value <- as.character(r1$Value) %>%
   gsub(‘,’,”,.) %>%
   gsub(‘:’,’0′,.) %>%
   as.numeric(.)
r2 <- reshape(r1,direction=’wide’,
    idvar=c(‘ICD10′,’ICD10_LABEL’),
    v.names=’Value’,
    timevar=’GEO’,
    drop=c(‘UNIT’,’TIME’,’AGE’,’SEX’,’Flag.and.Footnotes’))
names(r2) <- gsub(‘Value.’,”,names(r2),fixed=TRUE)

r2 <- r2[!(r2$ICD10 %in% 
          c(‘A-R_V-Y’,’A-B’,’C’,’E’,’F’,’G-H’,’I’,’J’,’K’,’M’,’N’,’R’,’V01-Y89′,’ACC’))
    ,]
row.names(r2) <- r2$ICD10_LABEL
m1 <- as.matrix(r2[,-(1:2)]) %>% t(.)

m2 <- m1[rownames(m1) %in% [email protected]$NUTS_ID,]
pr1 <- princomp(m2,cor=TRUE) 
#plot(pr1)  
#biplot(pr1)
#plot(pr1$loadings[,1],pr1$loadings[,2])
myscale <-function(x) 
  (x-min(x))/(max(x)-min(x))

tom <- data.frame(
    loc=as.character(rownames(pr1$scores)),
    rgbr=myscale(pr1$scores[,1]),
    rgbg=myscale(pr1$scores[,2]),
    rgbb=myscale(pr1$scores[,3])) 
tom$rgb <- with(tom,rgb(rgbr,rgbg,rgbb))

[email protected] = data.frame([email protected][,1:4], tom[
        match([email protected][, “NUTS_ID”],tom[, “loc”]),   ])

png(‘map.png’)
par(mar=rep(0,4))
plot <- plot(EUN, col = [email protected]$rgb, 
    axes = FALSE, border = NA)    
dev.off()
load <- as.data.frame(as.matrix(pr1$loadings[,1:3]))
load$name  <- rownames(pr1$loadings) 
load <- mutate(load,
    rgbr=myscale(Comp.1),
    rgbg=myscale(Comp.2),
    rgbb=myscale(Comp.3), 
    rgb =rgb(rgbr,rgbg,rgbb))
range(load$Comp.1)
png(‘legend2.png’,
    width=960,
    height=960,
    res=144)
par(mar=rep(0,4))
with(load,plot(x=Comp.1,y=Comp.2,type=’n’,
        xlim=range(Comp.1)*1.2))
sample_n(load,40) %>%
    with(.,text(x=Comp.1,y=Comp.2,
    labels=name,
    col=rgb,cex=.5))
dev.off()

To leave a comment for the author, please follow the link and comment on their blog: Wiekvoet.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)