Calculate Wages and Benefits with blscrapeR

May 31, 2017

(This article was first published on Data Science Riot!, and kindly contributed to R-bloggers)

The most difficult thing about working with BLS data is gaining a clear understanding on what data are available and what they represent. Some of the more popular data sets can be found on the BLS Databases, Tables & Calculations website. The selected examples below do not include all series or databases.

Install blscrapeR

The first step in analyzing any of these data in R is to install the blscrapeR package from CRAN.


Current Population Survey (CPS)

The CPS includes median weekly earnings by occupation, among other things.

For example, we can use blscrapeR to pull data from the API for the median weekly earnings for Database Administrators and Software Developers.

# Median Usual Weekly Earnings by Occupation, Unadjusted Second Quartile.
# In current dollars
df <- bls_api(c("LEU0254530800", "LEU0254530600"), startyear = 2000, endyear = 2009) %>%
    spread(seriesID, value) %>%

# Plot
ggplot(data = df, aes(x = date)) + 
    geom_line(aes(y = LEU0254530800, color = "Database Admins.")) +
    geom_line(aes(y = LEU0254530600, color = "Software Devs.")) + 
    labs(title = "Median Weekly Earnings by Occupation") + ylab("value") +
    theme(legend.position="top", plot.title = element_text(hjust = 0.5)) 

plot of chunk unnamed-chunk-3

Occupational Employment Statistics (OES)

The OES contains similar wage data found in the CPS, but often has more resolution in certain geographic areas. Unlike the CPS, the OES is an annual survey and does not keep time series data.

For example, we may want to compare the average hourly wage of Computer and Information Systems Managers in Orlando, FL to those in San Jose, CA. Notice, below the survey only returns values for 2015.

# Computer and Information Systems Managers in Orlando, FL and San Jose, CA.
# Orlando: "OEUM003674000000011302103"
# San Jose: "OEUM004194000000011302108"
df <- bls_api(c("OEUM003674000000011302103", "OEUM004194000000011302108"))

##   year period periodName value footnotes                  seriesID
## 1 2016    A01     Annual 67.84           OEUM003674000000011302103
## 2 2016    A01     Annual 87.53           OEUM004194000000011302108

Another OES example would be to grab the most recent Annual mean wage for All Occupations in All Industries in the United States.

df <- bls_api("OEUN000000000000000000004")

##   year period periodName value footnotes                  seriesID
## 1 2016    A01     Annual 49630           OEUN000000000000000000004

Employer Cost for Employee Compensation

This data set includes time series data on how much employers pay for employee benefits as a total cost and as a percent of employee wages and salaries.

For example, if we want to see the total cost of benefits per hour work and also see what percentage that is of the total compensation, we could run the following script.

df <- bls_api(c("CMU1030000000000D", "CMU1030000000000P"))

# Spread series ids and rename columns to human readable format.
df.sp <- spread(df, seriesID, value) %>%
    rename("hourly_cost"=CMU1030000000000D, "pct_of_wages"=CMU1030000000000P)

# Percentages are represented as floating integers. Fix this to avoid confusion.
df.sp$pct_of_wages <- df.sp$pct_of_wages*0.01

##   year period  periodName footnotes hourly_cost pct_of_wages
## 1 2014    Q04 4th Quarter                 10.49        0.316
## 2 2014    Q03 3rd Quarter                 10.07        0.313
## 3 2014    Q02 2nd Quarter                 10.00        0.313
## 4 2014    Q01 1st Quarter                  9.97        0.312
## 5 2015    Q04 4th Quarter                 10.52        0.313
## 6 2015    Q03 3rd Quarter                 10.48        0.314

National Compensation Survey-Benefits

This survey includes data on how many Americans have access to certain benefits. For example, we can see the percentage of those who have access to paid vacation days and those who have access to Health insurance through their employers.

df <- bls_api(c("NBU10500000000000033030", "NBU11500000000000028178"))

# Spread series ids and rename columns to human readable format.
df.sp <- spread(df, seriesID, value) %>%
    rename("pct_paid_vacation"=NBU10500000000000033030, "pct_health_ins"=NBU11500000000000028178)

# Value data are in whole numbers but represent percentages. Fix this to avoid confusion.
df.sp$pct_paid_vacation <- df.sp$pct_paid_vacation*0.01
df.sp$pct_health_ins <- df.sp$pct_health_ins*0.01

##   year period periodName footnotes pct_health_ins pct_paid_vacation
## 1 2014    A01     Annual                     0.72              0.74
## 2 2015    A01     Annual                     0.72              0.74
## 3 2016    A01     Annual                     0.70              0.73

If you want more mapping options, there is more information in the blscrapeR package vignettes.

To leave a comment for the author, please follow the link and comment on their blog: Data Science Riot!. offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)