Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The kind folks over at @RStudio gave a nod to my recently CRAN-released epidata package in their January data package roundup and I thought it might be useful to give it one more showcase using the recent CRAN update to ggalt and the new hrbrthemes (github-only for now) packages.

### Labor force participation rate

The U.S. labor force participation rate (LFPR) is an oft-overlooked and under- or mis-reported economic indicator. I’ll borrow the definition from Investopedia:

The participation rate is a measure of the active portion of an economy’s labor force. It refers to the number of people who are either employed or are actively looking for work. During an economic recession, many workers often get discouraged and stop looking for employment, resulting in a decrease in the participation rate.

Population age distributions and other factors are necessary to honestly interpret this statistic. Parties in power usually dismiss/ignore this statistic outright and their opponents tend to wholly embrace it for criticism (it’s an easy target if you’re naive). “Yay” partisan democracy.

Since the LFPR is has nuances when looked at categorically, let’s take a look at it by attained education level to see how that particular view has changed over time (at least since the gov-quants have been tracking it).

We can easily grab this data with epidata::get_labor_force_participation_rate()(and, we’ll setup some library() calls while we’re at it:

library(epidata)
library(hrbrthemes) # devtools::install_github("hrbrmstr/hrbrthemes")
library(ggalt)
library(tidyverse)
library(stringi)

part_rate <- get_labor_force_participation_rate("e")

glimpse(part_rate)
## Observations: 457
## Variables: 7
## $date 1978-12-01, 1979-01-01, 1979-02-01, 1979-03-01, 1979-04-01, 1979-05-01... ##$ all                0.634, 0.634, 0.635, 0.636, 0.636, 0.637, 0.637, 0.637, 0.638, 0.638, 0...
## $less_than_hs 0.474, 0.475, 0.475, 0.475, 0.475, 0.474, 0.474, 0.473, 0.473, 0.473, 0... ##$ high_school        0.690, 0.691, 0.692, 0.692, 0.693, 0.693, 0.694, 0.694, 0.695, 0.696, 0...
## $some_college 0.709, 0.710, 0.711, 0.712, 0.712, 0.713, 0.712, 0.712, 0.712, 0.712, 0... ##$ bachelor's_degree  0.771, 0.772, 0.772, 0.773, 0.772, 0.772, 0.772, 0.772, 0.772, 0.773, 0...
## $advanced_degree 0.847, 0.847, 0.848, 0.848, 0.848, 0.848, 0.847, 0.847, 0.848, 0.848, 0... One of the easiest things to do is to use ggplot2 to make a faceted line chart by attained education level. But, let’s change the labels so they are a bit easier on the eyes in the facets and switch the facet order from alphabetical to something more useful: gather(part_rate, category, rate, -date) %>% mutate(category=stri_replace_all_fixed(category, "_", " "), category=stri_trans_totitle(category), category=stri_replace_last_regex(category, "Hs$", "High School"),
category=factor(category, levels=c("Advanced Degree", "Bachelor's Degree", "Some College",
"High School", "Less Than High School", "All")))  -> part_rate

Now, we’ll make a simple line chart, tweaking the aesthetics just a bit:

ggplot(part_rate) +
geom_line(aes(date, rate, group=category)) +
scale_y_percent(limits=c(0.3, 0.9)) +
facet_wrap(~category, scales="free") +
labs(x=paste(format(range(part_rate$date), "%Y-%b"), collapse=" to "), y="Participation rate (%)", title="U.S. Labor Force Participation Rate", caption="Source: EPI analysis of basic monthly Current Population Survey microdata.") + theme_ipsum_rc(grid="XY", axis="XY") The “All” view is interesting in that the LFPR has held fairly “steady” between 60% & 70%. Those individual and fractional percentage points actually translate to real humans, so the “minor” fluctuations do matter. It’s also interesting to see the direct contrast between the starting historical rate and current rate (you could also do the same with min/max rates, etc.) We can use a “dumbbell” chart to compare the 1978 value to today’s value, but we’ll need to reshape the data a bit first: group_by(part_rate, category) %>% arrange(date) %>% slice(c(1, n())) %>% spread(date, rate) %>% ungroup() %>% filter(category != "All") %>% mutate(category=factor(category, levels=rev(levels(category)))) -> rate_range filter(part_rate, category=="Advanced Degree") %>% arrange(date) %>% slice(c(1, n())) %>% mutate(lab=lubridate::year(date)) -> lab_df (We’ll be using the extra data frame to add labels the chart.) Now, we can compare the various ranges, once again tweaking aesthetics a bit: ggplot(rate_range) + geom_dumbbell(aes(y=category, x=1978-12-01, xend=2016-12-01), size=3, color="#e3e2e1", colour_x = "#5b8124", colour_xend = "#bad744", dot_guide=TRUE, dot_guide_size=0.25) + geom_text(data=lab_df, aes(x=rate, y=5.25, label=lab), vjust=0) + scale_x_percent(limits=c(0.375, 0.9)) + labs(x=NULL, y=NULL, title=sprintf("U.S. Labor Force Participation Rate %s-Present", lab_df$lab[1]),
caption="Source: EPI analysis of basic monthly Current Population Survey microdata.") +
theme_ipsum_rc(grid="X")

### Fin

One takeaway from both these charts is that it’s probably important to take education level into account when talking about the labor force participation rate. The get_labor_force_participation_rate() function — along with most other functions in the epidata package — also has options to factor the data by sex, race and age, so you can pull in all those views to get a more nuanced & informed understanding of this economic health indicator.