# Our Friend the Age-Earnings Profile

March 7, 2011
By

(This article was first published on Back Side Smack » R Stuff, and kindly contributed to R-bloggers)

I like Labor Economics. Partially because it has a nice mix of theory and practical empiricism, but mostly because it seems to be a sub-field with a number of agreed upon stylized facts that grow not out of micro theory but out of hundreds of empirical studies. One of those facts is the age-earnings profile [PDF]. Basically, as individuals age they experience relatively rapid wage growth in their 20s and 30s and slowing wage growth in their 40s and 50s. There is quite a bit of discussion over what causes the age earnings profile. Internal labor markets, growth of human capital, employee matches, etc. There are a number of competing theories and not all of them are mutually exclusive. But the profile itself is pretty robust.

What does the profile look like?

 From AEP

Above is a scatterplot of log earnings per hour against age. It was drawn from a small sub-sample of the March 2002 Current Population Survey (Basically New England + NY, NJ and PA). Education in this sub-sample was coded as “college” or “high school” with dropouts from either removed from the sample, so we don’t get a true age-earnings profile. A true profile would subtract years of education (and some number like 5-6 for years between birth and education starting) from age to get a measure of experience. But you can see the basic shape of the age earnings profile in the plot. A slight increase in the average log earnings over the first 10-20 years in the workforce, then a flattening of the earnings curve. I have not actually plotted a regression of log earnings on wage, but we could. We could certainly fit a linear regression, but we would see a big relationship between age and the residuals (giving us the impression that a linear fit doesn’t capture the data). We could also add a quadratic term into the linear regression (e.g. `lm(log(earnings) ~ age + I(age^2))`) or we could use a spline term in the regression.

One thing to consider when determining the form of regression is other control variables. On this same dataset, we can see the distribution of earnings among high school and college graduates:

 From AEP

The distributions are not conditional on age, race or sex, but we can see a great deal more variation on earnings among college graduates than high school graduates. So we could also imagine that the AEP would be shaped somewhat differently for college graduates. There are a few reasons why this might be. First of all, high school graduates are probably more likely to be involved in hourly work rather than salaried work and measuring hourly wages is probably much more accurate than imputing an hourly wage from salary information. I would also imagine that since high school graduate earnings are lower they also are more likely to face binding lower constraints (minimum wage laws, etc.) and might vary less. Either way we can expect to see a different shape among high school graduates than among college graduates.

Code for this post is below. It includes a lot more than the above graph because it was part of a homework assignment. Some of the code is cool (especially converting binary factors to single columns with levels), but most of it is housekeeping.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...