HEURISTICS FOR ESTIMATING LIFE EXPECTANCY
We once learned from a doctor a rule of thumb for predicting how long a person will live (i.e., the life expectancy). The doctor’s heuristic was:
(100 minus the patient’s age) divided by 2
We wanted to see how accurate this rule was, so we downloaded life expectancy data from the US government and plotted the model’s predictions against their estimates of life expectancy. See above. The doctor’s model is in blue. It’s pretty good in the 65 to 95 age range. The doctor worked in a nursing home. The heuristic fit the environment.That said, the doctor’s rule lousy outside that age range. And of course it assumes people will die by 100.
The doctor’s heuristic is a simple linear model. How well does simple linear regression do? We solved for it and plotted it in red above. We’ll see later how they compare in error, but it’s safe to say they’re both pretty lousy.
Let’s fit some better models. Forget survival models. Too hard for mortals to apply. Looking at the life expectancy curve, it seems like a polynomial and a two-part linear function would do a good job. They do.
However, our goal is to get something that someone could do in their head. Something like the doctor’s heuristic, but smarter.
We came up with two candidates.
1) The heuristic bi-linear model. We made this by making the best bi-linear model a bit simpler to apply.
If you’re under 85, your life expectancy is 72 minus 80% of your age.
Otherwise it’s 22 minus 20% of your age
2) The 50-15-5 model. This one asks you to remember some key values and then to interpolate between those values. It goes:
The life expectancies of 30, 70, 90 and 110 year olds are about 50, 15, 5, and 0.
Go forth and interpolate!
Here is the performance of the heuristic models:
It’s not a horse-race without some measure of accuracy. Below we plot the mean absolute deviation for all the models. Except for the linear models, the heuristics make estimates that are off by less than one year on average. That said, one needs to understand that one’s life expectancy is just the best guess, but there’s a lot of variation around that best guess. The 90% confidence interval around my estimated age at death spans roughly 40 years!
The lesson is, linear fits to life expectancy are bad. Everything else is pretty good.
Can you come up with better heuristics? Here’s some R code to see if you can:
For fun, check out this sweet plot of how the mean absolute error changes as you vary the cut point in the best bi-linear model. The cut point is 30+”cut” in the graph, so we cut around age 80.
ADDENDUM: Dean Foster just stopped by my desk and promoted