Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I just posted an interesting look at the growth of the labor force by decade. Given that I used R to produce it, I thought it interesting to share the R code and method. Just for reference, here is the chart:

First, we will need the following libraries

library(quantmod)
library(tidyverse)
library(extrafont)


The quantmod library is a significant reason that I keep using R, and you are about to see why. This retrieves the civilian labor force from the Federal Reserve’s data API.

getSymbols('CLF16OV', src = 'FRED')


From here we will break the levels into decades using the window() function.

labor_force_50s <- window( CLF16OV, start='1950-01-01', end='1959-12-31' )
labor_force_60s <- window( CLF16OV, start='1960-01-01', end='1969-12-31' )
labor_force_70s <- window( CLF16OV, start='1970-01-01', end='1979-12-31' )
labor_force_80s <- window( CLF16OV, start='1980-01-01', end='1989-12-31' )
labor_force_90s <- window( CLF16OV, start='1990-01-01', end='1999-12-31' )
labor_force_00s <- window( CLF16OV, start='2000-01-01', end='2009-12-31' )
labor_force_10s <- window( CLF16OV, start='2010-01-01', end='2019-12-31' )


It would be prettier to use a loop to generate these variables, but I didn’t do that because I was being mentally lazy…

From here its time to organize the data. We ultimately want to convert the values to rates of change by decade. To do that, we’ll need to

2. convert the value to a percent change, and
3. sum the change.

At the end, we need to combine it all back into one data frame (or tibble, since I am trying to use tidyverse now).

Here we go. First, let’s do the thing I was too lazy to do before: list the decades as characters in a list.

names <- c('50s', '60s', '70s', '80s', '90s', '00s', '10s')


Then, we’ll run a loop that pulls the data by decade. Note the change variable. It takes the values, converts it to a percent change using the Delt() function. Here we encounter a problem, because getting the cumulative sum cannot happen if there is an NA in the data. We know for sure there will be because the Delt() function returns NA in the first position. So after pushing it through the Delt() funciton, we are telling R to replace NA with 0. Finally, we get the cumulative sum of the result and store it as change. Last, we store the result in a tibble (data frame) and save it as the variable name we had before.

for(i in 1:length(names))
{
data <- get( paste( 'labor_force_', names[i], sep='' ) ) # Load the data
dates <- index(data) # Seperate dates
values <- coredata(data) # Seperate values
month_of_decade <- seq(1, 12*10, 1) #
change <- values %>% # Take the values
Delt() %>% # Convert to rate of change
ifelse( is.na(.) == TRUE, 0, .) %>% # Replace NA with 0
cumsum() # Return the cumulative sum

# Store it all in a tibble
data1 <- tibble( Date = dates,
Value = values,
Change = change,
# Write it to the variable name it had before
assign( paste('labor_force_', names[i], sep=''),
data1 )

}


Now, we bind it all together using bind_rows().

labor_force <- bind_rows( labor_force_50s,
labor_force_60s,
labor_force_70s,
labor_force_80s,
labor_force_90s,
labor_force_00s,
labor_force_10s )


Again, this could be done more efficiently, but I’m lazy today.

Lastly, we push it into ggplot and generate our custom visualization!

ggplot(labor_force, aes(x=Month, y=Change, col=Decade))+
geom_line( size = 1.5 )+
scale_color_brewer(palette='Dark2')+
ylab("Cumulative Change")+
theme_bw()


Yielding

Okay… so I cheated a little and didn’t give you my custom theme. That’s okay. You could probably figure it out if you wanted to…

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.