Example 9.11: Employment plot

[This article was first published on SAS and R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.


A facebook friend posted the picture reproduced above– it makes the case that President Obama has been a successful creator of jobs, and also paints GW Bush as a president who lost jobs. Another friend pointed out that to be fair, all of Bush’s presidency ought to be included. Let’s make a fair plot of job growth and loss. Data can be retrieved from the Bureau of Labor Statistics, where Nick will be spending his next sabbatical. The extract we use below is also available from the book website. This particular table reports the cumulative change over the past three months, adjusting for seasonal trends. This tends to smooth out the line.

SAS

The first job is to get the data into SAS. Here we demonstrate reading it directly from a URL, as outlined in section 1.1.6.
filename myurl
   url "http://www.math.smith.edu/sasr/datasets/bls.csv";
       
data q_change;
   infile myurl delimiter=',';
input  Year Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Annual;
run;

The raw data are in a pretty inconvenient format for plotting. To make a long, narrow data set with a row for each month, we’ll use proc transpose (section 1.5.3) to flip each year on its side. Then, to attach a date to each measure, we’ll use the compress function. First we add “01” (the first of the month) to the month name, which is in a variable created by proc transpose with the default name “_name_”. Then we tack on the year variable, and input the string in the date format. The resulting variable is a SAS date (number of days since 12/31/1959, see section 1.6.1).
proc transpose data=q_change out=q2;
  by year;
run;

data q3;
set q2;
  date1 = input(compress("01"||_name_||year),date11.);
run;

Now the data are ready to plot. It would probably be possible to use proc sgplot but proc gplot is more flexible and allows better control for presentation graphics.
title "3-month change in private-sector jobs, seasonally adjusted";
axis1 minor = none label = (h=2 angle = 90 "Thousands of jobs")
  value = (h = 2);
axis2 minor = none value = (h=2)label = none 
  offset = (1cm, -5cm)
  reflabel = (h=1.5 "Truman" "Eisenhower" "Kennedy/Johnson" 
    "Nixon/Ford" "Carter" "Reagan" "GHW Bush" "Clinton" "GW Bush" "Obama" );
symbol1 i=j v=none w=3;

proc gplot data=q3;
plot col1 * date1 / vaxis=axis1 haxis=axis2 vref=0 
  href = '12apr1945'd '21jan1953'd '20jan1961'd '20jan1969'd 
    '20jan1977'd '21jan1981'd '20jan1989'd '20jan1993'd 
    '20jan2001'd '20jan2009'd;
format date1 monyy6.;
run;
quit;

Much of the syntax above has been demonstrated in our book examples and blog entries. What may be unfamiliar is the use of the href option in the plot statement and the reflabel option in the axis statement. The former draws reference lines at the listed values in the plot, while the latter adds titles to these lines. The resulting plot is shown here.

Looking fairly across postwar presidencies, only the Kennedy/Johnson and Clinton years were mostly unmarred by periods with large losses in jobs. The Carter years were also times jobs were consistently added. While the graphic shared on facebook overstates the case against GW Bush, it fairly shows Obama as a job creator thus far, to the extent a president can be credited with jobs created on his watch.


R

The main trick in R is loading the data and getting it into the correct format.
here we use cbind() to grab the appropriate columns, then transpose that matrix and turn it into a vector which serves as input for making a time series object with the ts() command (as in section 4.2.8). Once this is created, the default plot for a time series object is close to what we have in mind.
ds = read.csv("http://www.math.smith.edu/sasr/datasets/bls.csv", 
              header=FALSE)
jobs = with(ds, cbind(V2, V3, V4, V5, V6, V7, V8, V9, V10,
                      V11, V12, V13))
jobsts = ts(as.vector(t(jobs)), start=c(1945, 1), 
            frequency=12)
plot(jobsts, plot.type="single", col=4,
     ylab="number of jobs (in thousands)")

All that remains is to add the reference lines for 0 jobs and the presidencies. The lines are most easily added with the abline() function (section 5.2.1). Easier than adding labels for the lines within the plot function will be to use the mtext() function to place the labels in the margins. We’ll write a little function to save a few keystrokes by plotting the line and adding the label together.
abline(h=0)
presline = function(date,line,name){
  mtext(at=date,text= name, line=line)
  abline(v = date)
}
presline(1946,1,"Truman")
presline(1953,2,"Eisenhower")
presline(1961,1,"Kennedy/Johnson")
presline(1969,2,"Nixon/Ford")
presline(1977,1,"Carter")
presline(1981,2,"Reagan")
presline(1989,1,"GHW Bush")
presline(1993,2,"Clinton")
presline(2001,1,"GW Bush")
presline(2009,2,"Obama")

It might be worthwhile to standardize the number of jobs to the population size, since the dramatic loss of jobs due to demobilization after the Second World War during a single month in 1945 (2.4 million) represented 1.7% of the population, while the recent loss of 2.3 million jobs in 2009 represented only 0.8% of the population.


To leave a comment for the author, please follow the link and comment on their blog: SAS and R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)