Mortality by Weekday and Age

[This article was first published on R-english – Freakonometrics, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

A few days ago, I did mention on Twitter a nice graph, with

My colleague Jean-Philippe was extremely sceptical, so I tried to reproduce that graph. The good thing is that we have the Social Security Death Master File, for data in the US. To be more specific, I have three big files on my hard drive, and in order to reproduce that graph, we’ll load the data by chunks. But before, because we have the day of birth, and the day of death, I need a function to compute the age. So here it is

> age_years <- function(earlier, later)
+ {
+   lt <- data.frame(earlier, later)
+   age <- as.numeric(format(lt[,2],format="%Y")) - as.numeric(format(lt[,1],format="%Y"))
+   dayOnLaterYear <- ifelse(format(lt[,1],format="%m-%d")!="02-29",
+                            as.Date(paste(format(lt[,2],format="%Y"),"-",format(lt[,1],format="%m-%d"),sep="")),
+                            ifelse(as.numeric(format(later,format="%Y")) %% 400 == 0 | as.numeric(format(later,format="%Y")) %% 100 != 0 & as.numeric(format(later,format="%Y")) %% 4 == 0,
+                                   as.Date(paste(format(lt[,2],format="%Y"),"-",format(lt[,1],format="%m-%d"),sep="")),
+                                   as.Date(paste(format(lt[,2],format="%Y"),"-","02-28",sep=""))))
+   age[which(dayOnLaterYear > lt$later)] <- age[which(dayOnLaterYear > lt$later)] - 1
+   age
+ }

from github.com/nzcoops. Now, it is possible to create a similar table, based on that huge file (we have almost 50 million observations)

> cols <- c(1,9,20,4,15,15,1,2,2,4,2,2,4,2,5,5,7)
> noms_col <- c ("code","ssn","last_name","name_suffix","first_name","middle_name","VorPCode","date_death_m","date_death_d","date_death_y","date_birth_m","date_birth_d","date_birth_y","state","zip_resid","zip_payment","blanks")
> library(LaF)

> TABLE_AGE_DAY=function(temp = "ssdm3"){
+ ssn <- laf_open_fwf( temp,column_widths = cols,column_types=rep("character",length(cols) ),column_names = noms_col,trim = TRUE)
+ object.size(ssn)
+ go_through <- seq(1,nrow(ssn),by = 1e05 )
+ if(go_through[ length(go_through)] != nrow( ssn)) go_through <- c(go_through,nrow( ssn))
+ go_through <- cbind(go_through[-length(go_through)],c(go_through[-c(1,length(go_through)) ]-1,go_through [ length(go_through)]))
+ go_through
+ 
+ pb <- txtProgressBar(min = 0, max = nrow( go_through), style = 3)
+ count_birthday <- function(s){
+   #print(s)
+   setTxtProgressBar(pb, s)
+   data <- ssn[ go_through[s,1]:go_through[s,2],c("date_death_y","date_death_m","date_death_d",
+                                                  "date_birth_y","date_birth_m","date_birth_d")]
+   date1=as.Date(paste(data$date_birth_y,"-",data$date_birth_m,"-",data$date_birth_d,sep=""),"%Y-%m-%d")
+   date2=as.Date(paste(data$date_death_y,"-",data$date_death_m,"-",data$date_death_d,sep=""),"%Y-%m-%d")
+   idx=which(!(is.na(date1)|is.na(date2)))
+   date1=date1[idx]
+   date2=date2[idx]
+   itg=try(age<-age_years(date1,date2),silent=TRUE)
+   if(inherits(itg, "try-error")) age=trunc((date2-date1)/365.25)
+   w=weekdays(date2)
+   T=table(age,w)
+   Tab=matrix(0,106,7)
+   for(i in 1:nrow(T)) if(as.numeric(rownames(T)[i])<106) Tab[as.numeric(rownames(T)[i]),]=T[i,]
+   return(Tab)
+ }
+ D <- lapply( seq_len(nrow( go_through)),count_birthday) 
+ T=D[[1]]
+ for(s in 2:length(D)) T=T+D[[s]]
+ return(T)
+ }

If we run that function on the three files

> D1=TABLE_AGE_DAY("ssdm1")
|========================================| 100%
> D2=TABLE_AGE_DAY("ssdm2")
|========================================| 100%
> D3=TABLE_AGE_DAY("ssdm3")
|========================================| 100%

we can visualize not percentages, as on the figure above, but counts

> D=D1+D2+D3
> colnames(D)=
c("Sun","Thu","Mon","Tue","Wed","Sat","Fri")
> D=D1[,
c("Sun","Mon","Tue","Wed","Thu","Fri","Sat")]

and we have here (I remove the Saturday to get a better output)

> D[,1:6]
          Sun    Mon    Tue    Wed    Thu    Fri
  [1,]   2843   2888   2943   3020   2979   3038
  [2,]   2007   1866   1918   1974   1990   2137
  [3,]   1613   1507   1532   1530   1515   1613
  [4,]   1322   1256   1263   1259   1207   1330
  [5,]   1155   1061   1092   1128   1112   1171
  [6,]   1067    985    950   1082   1009   1055
  [7,]   1129    901    915    954    941   1044
  [8,]   1026    927    944    935    911   1005
  [9,]   1029   1012    871    908    939    998
 [10,]   1093   1011    974    958    928   1018
 [11,]   1106   1031   1019   1036   1087   1122
 [12,]   1289   1219   1176   1215   1141   1292
 [13,]   1618   1455   1487   1484   1466   1633
 [14,]   2121   2000   1900   1941   1845   2138
 [15,]   2949   2647   2519   2499   2524   2748
 [16,]   4488   3885   3798   3828   3747   4267
 [17,]   5709   4612   4520   4422   4443   5005
 [18,]   7280   5618   5400   5271   5344   5986
 [19,]   8086   6172   5833   5820   6004   6628
 [20,]   8389   6507   6166   6055   6430   6955
 [21,]   8794   7038   6794   6628   6841   7572
 [22,]   8578   6528   6512   6472   6757   7342
 [23,]   8345   6750   6483   6469   6714   7338
 [24,]   8361   6859   6589   6623   6854   7369
 [25,]   8398   6974   6892   6766   6964   7613
 [26,]   8432   7210   7012   7175   7343   7801
 [27,]   8757   7641   7526   7352   7674   7950
 [28,]   9190   8041   7843   7851   7940   8268
 [29,]   9495   8409   8555   8400   8469   8934
 [30,]   9876   9041   9015   9166   9106   9641
 [31,]  10567   9952   9506   9634   9770  10212
 [32,]  11417  10428  10402  10275  10455  11169
 [33,]  11992  11306  11124  11095  11243  11749
 [34,]  12665  12327  11760  12025  12137  12443
 [35,]  13629  13135  13179  13037  12968  13724
 [36,]  14560  14009  13927  13822  14105  14436
 [37,]  15660  14990  15013  15009  15101  15700
 [38,]  16749  16504  16148  16091  15912  16863
 [39,]  17815  17760  17519  17144  17553  17943
 [40,]  19366  19057  18918  18517  18760  19604
 [41,]  20770  20458  20154  20339  20349  21238
 [42,]  21962  22194  22020  21499  21690  22347
 [43,]  23803  23922  23701  23681  23437  24227
 [44,]  25685  26133  25559  25209  25287  26115
 [45,]  27506  28110  27363  27042  27272  28228
 [46,]  29366  29744  29555  29245  29678  30444
 [47,]  31444  32193  31817  31504  31753  32302
 [48,]  33452  34719  33529  33954  33441  34618
 [49,]  36186  37150  36005  36064  36226  37138
 [50,]  38401  39244  38813  38465  38506  39884
 [51,]  40331  41830  41168  41110  40937  42014
 [52,]  43181  44351  43975  43949  43579  44734
 [53,]  45307  47134  46522  46149  46089  47286
 [54,]  47996  49441  49139  48678  48629  49903
 [55,]  50635  52424  51757  51433  51477  52550
 [56,]  53509  55337  54556  54482  54406  55906
 [57,]  55703  58482  58016  57400  57097  58758
 [58,]  59016  61453  60652  61024  60557  62473
 [59,]  62475  65651  64169  63824  63829  65592
 [60,]  66621  69185  68885  68217  68752  69963
 [61,]  69759  73144  72421  71784  71745  73414
 [62,]  80346  84253  83044  83177  82416  83833
 [63,]  86851  90059  89002  88985  89245  90334
 [64,]  91839  95465  94602  93985  94154  96195
 [65,]  98461 102846 101348 101328 101306 103170
 [66,] 104569 108722 107768 107711 107729 109350
 [67,] 111230 115477 114418 114743 113935 116356
 [68,] 116999 122053 120727 120342 119782 122926
 [69,] 123695 128339 127184 126822 126639 129037
 [70,] 129956 136123 134555 135120 133842 137390
 [71,] 137984 142964 141316 142855 141419 143620
 [72,] 145132 150708 148407 149345 149448 151910
 [73,] 152877 157993 155861 156349 155924 158725
 [74,] 159109 164652 162722 163499 163157 165744
 [75,] 165848 172121 170730 170482 170585 173431
 [76,] 172457 179036 177185 177328 177392 180215
 [77,] 179936 185015 183223 183932 183237 186663
 [78,] 185900 191053 189986 189730 189639 193038
 [79,] 191498 196694 194246 194810 195246 197812
 [80,] 195505 201289 199684 199561 198968 203226
 [81,] 199031 204927 202204 202622 202951 205792
 [82,] 201589 207928 204929 204001 204396 208224
 [83,] 201665 206743 205194 204676 205256 207980
 [84,] 200965 205653 203422 202393 203422 206012
 [85,] 197445 202692 199498 199730 200075 201728
 [86,] 192324 195961 193589 194754 193800 196102
 [87,] 183732 188063 185153 186104 186021 188176
 [88,] 174258 177474 175822 176078 176761 177449
 [89,] 163180 166706 162810 164367 164281 166436
 [90,] 149169 151738 150148 150212 150535 152435
 [91,] 134218 136866 134959 134922 135027 136381
 [92,] 118936 121106 119591 119509 119793 120998
 [93,] 102734 104955 102944 102865 103345 104776
 [94,]  87418  88885  88023  86963  87546  87872
 [95,]  72023  72698  72151  71579  71530  72287
 [96,]  56985  58238  57478  57319  57163  57615
 [97,]  44447  45058  44607  44469  43888  44868
 [98,]  33457  34132  33022  33409  33454  33642
 [99,]  24070  24317  24305  24089  24020  24383
[100,]  17165  17295  16755  17115  16957  17207
[101,]  11799  12125  11709  11816  11824  11719
[102,]   7714   7741   7959   7691   7648   7633
[103,]   5024   5012   4822   4792   4882   4916
[104,]   2987   3101   2978   3049   3093   2906
[105,]   1781   1894   1811   1756   1734   1834

So clearly, for young people, the number of deaths is rather small…

And to visualize it, as above, we can use

> P=D/apply(D,1,sum)*100
> range(P)
[1] 12.34857 17.59386
> dP=trunc((P-min(P))/(max(P)+.01-min(P))*11)
> library(RColorBrewer)
> CLR=rev(brewer.pal(name="RdYlBu", 11))

> plot(0:1,0:1,ylim=c(55,110),xlim=c(-1,7))
> for(i in 1:106){
+   for(j in 1:7){
+  rect(j-1,108-i,j,107-i,col=CLR[dP[i,j]])
+   }}
> text(rep(-.5,106),107.5-1:106,0:105,cex=.4)

As above, we observe a strong difference among weekdays for the date of death for young people (below 30) which disappear after (even if there is still a sunday effect)

To leave a comment for the author, please follow the link and comment on their blog: R-english – Freakonometrics.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)