Date of death, birthday and Elvis Presley

June 18, 2012

(This article was first published on Freakonometrics - Tag - R-english, and kindly contributed to R-bloggers)

10 days ago, a study published on mentioned that “Death has a preference for birthdays” (as claimed in the title). The conclusion of the paper is that, in general, birthdays do not evoke a postponement mechanism but appear to end up in a lethal way more frequently than expected (“anniversary reaction”). Well, this is not new, and several previous articles have mentioned that point, e.g. Angermeyer et al. (1987).

I found the idea interesting since in demography, there is a large literature trying to extrapolate death rates from discrete to continuous time. Extrapolation are usually extremely smooth. But none of them integrate that aspect of mortality precisely on the birthday. The problem is that it is rather difficult to say something since datasets with individual observations are rare, online.

But yesterday, @coulmont sent me a tweet mentioning a website. I do not know if this is legal (even if some explanations are given), but I will mention courtesy of It is a so-called Social Security Death Master File, containing individual informations about deaths in the US, as well as geographic information (as described on, for people having a social security number.

With R, it is possible to work on those files (even they are huge, with tens of millions observations). For instance, we can check who is inside.

> elvis=scan("ssdm2",skip=22371720,n=1,what="character",sep=",")
> elvis
[1] " 409522002PRESLEY         ELVIS     0800197701081935  " 

If you believe that Elvis is dead, you might agree that this database can be accurate (or at least, not too bad). And further, we can see here how to read the result: Elvis was born on January 8, 1935 (8 last digits), and died on August 16, 1977 (8 digits before). Obviously here, there are some problems with the dataset (we do not have the day of the death of Elvis). So here, we remove all the observations that do not give us proper dates. Then, the idea is to assume that the person died in 2000 (or any year since the point is to focus on days and months). Then, we count the number of days between the day of death and the birthday in 2001 (that would have been after) and the one in 2000 (that was either before or after the death), so that we can derive the number of days after the birthday,

DIFF=apply(diffday,1,function(x) {min(x[x>=0])})

What we have here is the number of days following the previous birthday. If we look at the distribution of that number of days, we obtain

> counts["0"]/(mean(counts[100:200]))

Thus, the death excess on the day of birth was around 12%, which is rather close to the one obtained from the Swiss mortality statistics 1969–2008 (in Ajdacic-Gross et al. (2012)). Note that here, we just play with a small subset of the entire dataset,

That database is probably extremely interesting, except that it suffers a huge selection bias, since only dead people are in that database. So it might be useless if we wish to study life expectancy of people named Bill versus people named Georges (that was something I wanted to investigate initially). But we’ll see what else we can do with it (since Ewen have been able to write some code to go through that huge dataset).

To leave a comment for the author, please follow the link and comment on their blog: Freakonometrics - Tag - R-english. offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Tags: , , , , , , , ,

Comments are closed.


Mango solutions

RStudio homepage

Zero Inflated Models and Generalized Linear Mixed Models with R

Dommino data lab

Quantide: statistical consulting and training



CRC R books series

Six Sigma Online Training

Contact us if you wish to help support R-bloggers, and place your banner here.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)