[This article was first published on Wiekvoet
, and kindly contributed to R-bloggers
]. (You can report issue about the content on this page here
Want to share your content on R-bloggers? click here
if you have a blog, or here
if you don't.
I like playing around with data from Eurostat. At this time the tools to do so are just so easy. There are tools to pull the data directly from the data base in R (eurostat package). Process it a bit using dplyr and before you know it, ggplot makes a plot.
My starting point to examine data is the database page. From there I can browse for the correct table and view its contents. Having done that, I can take the name of the table and pull that in R. The name of the vacancy database I chose (Job vacancy statistics – quarterly data (from 2001 onwards), NACE Rev. 2) is jvs_q_nace2, hence with
r1 <- get_eurostat('jvs_q_nace2')
I have all packages needed and the data in R. One of the properties of the data is that everything is coded. Hence the next step is to merge the codes. The following code pulls the country codes and does a bit of post processing on the names to get them a bit nicer. Subsequently, the variously combinations of countries determined by expanding of the EU and Euro area at various time points are removed. These data have the property that they are too abundant, some data removal is needed. Finally, seasonably adjusted data is selected and all company sizes are used.
# add country names
r2 <- get_eurostat_dic('geo') %>%
country=gsub(‘ $’,”,country)) %>%
# filter countries
s_adj==’SA’ ,# seas. adj.
sizeclas!=’Total’) %>% # all company sizes
For other variables, it is more or less the same. get_eurostat_dic() pulls the coding and they can be merged. The text in nace is a bit long, so I shortened it.
r3 <- get_eurostat_dic('nace_r2') %>%
nace_r2=V1, # add NACE
r4 <- get_eurostat_dic('indic_em') %>%
Since the data is now prepared, the next step is to plot. There are actually far too many categories in nace and a selection to be displayed is needed. If you want know what different categories are, use
nace <- select(r4,nace_r2,nace) %>% unique()
to display what each category represents. I chose to select a number of industry related categories. In addition some countries have very limited data, they are eliminated.
property==’Job vacancy rate’,
nace_r2 %in% c(‘A-S’,’B-E’,’B-S’,’B-F’),
c(‘Croatia’, ‘Greece’,’Portugal’,# limited years
‘Switzerland’)), # limited classes
facet_wrap( ~ country )+
ylab(‘Job vacancy rate’)+
In the plot the enormous drops for Cyprus, Czech Republic and Estonia are clearly visible. The Czech Republic is also rebounding quite steeply. UK had a smaller drop in 2008, but is now at pre-crisis job vacancy rates. In fact many countries show increases in job vacancy rate.
Getting a different display is just very easy. Below the call to get number of vacancies in education, information and communication and research. Since the number of vacancies is really dependent on country size, a logarithmic scale is chosen. The countries displayed are slightly different, it appears not all countries have all data. But the trends are similar as the previous plot.
property==’Number of job vacancies’,
nace_r2 %in% c(‘J’,’M’,’M_N’,’P’),
!(country %in% # limited years
ggplot(.,aes(x=time,y=values,color=nace )) +
facet_wrap( ~ country )+
ylab(‘Number of job vacancies’)+