Unemployment in Europe

February 2, 2016
By

(This article was first published on Wiekvoet, and kindly contributed to R-bloggers)

A couple of years I have made plots of unemployment and its change over the years. At first this was a bigger and complex piece of code. As things have progressed, the code can now become pretty concise. There are just plenty of packages to do the heavy lifting. So, this year I tried to make the code easy to read and reasonably documented.

Data

Data is from Eurostat. Since we have the joy of the Eurostat package, suffice to say this is dataset une_rt_m. Since the get_eurostat function gave me codes for things such as country and gender, the first step is to use a dictionary to decode. Subsequently, the country names are a bit sanitized and data is selected.
library(eurostat)
library(ggplot2)
library(KernSmooth)
library(plyr)
library(dplyr)

library(scales) # to access breaks/formatting functions

r1 <- get_eurostat(‘une_rt_m’)%>%
    mutate(.,geo=as.character(geo)) # character preferred for merge
r2 <- get_eurostat_dic(‘geo’) %>%
    rename(.,geo=V1) %>%
    mutate(.,
# part of country name within braces removed        
        country=gsub(‘\(.*$’,”,V2),
        country=gsub(‘ $’,”,country),
        country=ifelse(geo==’EA19′,paste(country,'(19)’),country)) %>%
    select(.,geo,country) %>%
    right_join(.,r1) %>%
# keep only total, drop sexes
    filter(.,sex==’T’) %>%
# filter out old Euro area and keep only EU28 , EA19    
    filter(.,!grepl(‘EA..’,geo)|  geo==’EA19′) %>% 
    filter(.,!(geo %in% c(‘EU15′,’EU25′,’EU27’)) ) %>%         
# SA is seasonably adjusted    
    filter(.,s_adj==’SA’) %>% 
    mutate(.,country=factor(country)) %>%
    select(.,-sex,-s_adj)

Plots

To make plots I want to have smoothed data. Ggplot will do this, but it is my preference to have the same smoothing for all curves, hence it is done before entering ggplot. There are a bit many countries, hence the number is reduced to 36, which are displayed in three plots of 3*4, for countries with low, middle and high maximum unemployment respectively. Two smoothers are applied, once for the smoothed data, the second for its first derivative. The derivative has forced more smooth, to avoid extreme fluctuation.
# add 3 categories for the 3 3*4 displays
r3 <- aggregate(r2$values,by=list(geo=r2$geo),FUN=max,na.rm=TRUE) %>%
    mutate(.,class=cut(x,quantile(x,seq(0,3)/3),
            include.lowest=TRUE,
            labels=c(‘low’,’middle’,’high’))) %>%
    select(.,-x) %>% # maxima not needed any more
    right_join(.,r2)

#locpoly to make smooth same for all countries
Perc <- ddply(.data=r3,.variables=.(age,geo), 
    function(piece,…) {
      piece <- piece[!is.na(piece$values),]
      lp <- locpoly(x=as.numeric(piece$time),y=piece$values,
          drv=0,bandwidth=90)
      sdf <- data.frame(Date=as.Date(lp$x,origin=’1970-01-01′),
          sPerc=lp$y,
          age=piece$age[1],
          geo=piece$geo[1],
          country=piece$country[1],
          class=piece$class[1])}
    ,.inform=FALSE
)

# locpoly for deriviative too

dPerc <- ddply(.data=r3,.variables=.(age,geo), 
    function(piece,…) {
      piece <- piece[!is.na(piece$values),]
      lp <- locpoly(x=as.numeric(piece$time),y=piece$values,
          drv=1,bandwidth=365/2)
      sdf <- data.frame(Date=as.Date(lp$x,origin=’1970-01-01′),
          dPerc=lp$y,          
          age=piece$age[1],
          geo=piece$geo[1],
          country=piece$country[1],
          class=piece$class[1])}
    ,.inform=FALSE
)

The plots are processed by subsection.

for (i in c(‘low’,’middle’,’high’)) {
  png(paste(i,’.png’,sep=”))
  g <- filter(Perc,class==i) %>%
      ggplot(.,
          aes(x=Date,y=sPerc,colour=age)) +
      facet_wrap( ~ country, drop=TRUE) +
      geom_line()  +
      theme(legend.position = “bottom”)+
      ylab(‘% Unemployment’) + xlab(‘Year’) +
      scale_x_date(breaks = date_breaks(“5 years”),
          labels = date_format(“%y”)) 
  print(g)
  dev.off()
}
for (i in c(‘low’,’middle’,’high’)) {
  png(paste(‘d’,i,’.png’,sep=”))
  g <- filter(dPerc,class==i) %>%
      ggplot(.,
          aes(x=Date,y=dPerc,colour=age)) +
      facet_wrap( ~ country, drop=TRUE) +
      geom_line()  +
      theme(legend.position = “bottom”)+
      ylab(‘Change in % Unemployment’) + xlab(‘Year’)+
      scale_x_date(breaks = date_breaks(“5 years”),
          labels = date_format(“%y”))
  print(g)
  dev.off()
}

Results

In general, things are improving, which is good news, though there is still ways to go. As always, Eurostat has a nice document are certainly more knowledgeable than me on this topic. 

Average unemployment

First derivative

To leave a comment for the author, please follow the link and comment on their blog: Wiekvoet.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)