A Short Return to the Age-Earnings Profile

[This article was first published on Back Side Smack » R Stuff, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Two posts ago I mentioned the age-earnings profile but did not provide a regression of log earnings on wage. I also offered, without evidence, that fitting a simple linear regression would be inappropriate. How do I know that? How could we determine the appropriateness of a regression? There are a number of technical or econometric means to determine mechanically whether or not a regression is appropriate. We can test for the functional form with the Breusch-Pagan test (a story about which will be left for another time) or the White test. Both of these tests are specifically for heteroskedasticity, not the functional form. However if we can imagine a process where our model is:

  • y_i = alpha + beta X_i + epsilon

But the true process is

  • y_i = alpha + beta X^2_i + epsilon

Our residuals (different than the errors!) from fitting the first model on the second model will vary with the X term, just as though our errors were heteroskedastic. But for simple enough models, we can take a step back and eyeball the regression. If we fit a linear model to a quadratic or otherwise partially linear term and plot the residuals against the X term we should be able to see some shape emerge. If our model is very well fitted and the underlying process is linear then the residuals will be constant across independent variables. If our model is mis-specified (as in our example above) the residuals might look like this:

From AEP

The above plot is easily recovered by plot(lm(log(eph) ~ age, data=adams)), a command which will bring up a number of different diagnostic plots. Let’s fit a local regression to the data and see what comes out.

From AEP

We probably over-estimate the decline in earnings as age goes on, but this is much better than our linear regression. Some causes of mis-estimation might be within our capacity to easily solve. I mentioned in the last post that a proper age-earnings profile would correctly code the ages of workers in the dataset, subtracting years of schooling from age. We might also talk about non-wage compensation and how that may increase over time. Further, we have dropped all the zeros from our dataset, which is pretty inappropriate. Correcting for entry and exit from the labor force may change the shape of our profile.

Code isn’t included because it is basically two lines stemming immediately from the past post.

To leave a comment for the author, please follow the link and comment on their blog: Back Side Smack » R Stuff.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)