Mortality by Weekday and Age
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
A few days ago, I did mention on Twitter a nice graph, with
Mortality by Weekday and Age https://t.co/LyzQ7nJABZ very interesting difference, young vs. old pic.twitter.com/EfrX0C1GBS
— Arthur Charpentier (@freakonometrics) 27 février 2016
My colleague Jean-Philippe was extremely sceptical, so I tried to reproduce that graph. The good thing is that we have the Social Security Death Master File, for data in the US. To be more specific, I have three big files on my hard drive, and in order to reproduce that graph, we’ll load the data by chunks. But before, because we have the day of birth, and the day of death, I need a function to compute the age. So here it is
> age_years <- function(earlier, later) + { + lt <- data.frame(earlier, later) + age <- as.numeric(format(lt[,2],format="%Y")) - as.numeric(format(lt[,1],format="%Y")) + dayOnLaterYear <- ifelse(format(lt[,1],format="%m-%d")!="02-29", + as.Date(paste(format(lt[,2],format="%Y"),"-",format(lt[,1],format="%m-%d"),sep="")), + ifelse(as.numeric(format(later,format="%Y")) %% 400 == 0 | as.numeric(format(later,format="%Y")) %% 100 != 0 & as.numeric(format(later,format="%Y")) %% 4 == 0, + as.Date(paste(format(lt[,2],format="%Y"),"-",format(lt[,1],format="%m-%d"),sep="")), + as.Date(paste(format(lt[,2],format="%Y"),"-","02-28",sep="")))) + age[which(dayOnLaterYear > lt$later)] <- age[which(dayOnLaterYear > lt$later)] - 1 + age + }
from github.com/nzcoops. Now, it is possible to create a similar table, based on that huge file (we have almost 50 million observations)
> cols <- c(1,9,20,4,15,15,1,2,2,4,2,2,4,2,5,5,7) > noms_col <- c ("code","ssn","last_name","name_suffix","first_name","middle_name","VorPCode","date_death_m","date_death_d","date_death_y","date_birth_m","date_birth_d","date_birth_y","state","zip_resid","zip_payment","blanks") > library(LaF) > TABLE_AGE_DAY=function(temp = "ssdm3"){ + ssn <- laf_open_fwf( temp,column_widths = cols,column_types=rep("character",length(cols) ),column_names = noms_col,trim = TRUE) + object.size(ssn) + go_through <- seq(1,nrow(ssn),by = 1e05 ) + if(go_through[ length(go_through)] != nrow( ssn)) go_through <- c(go_through,nrow( ssn)) + go_through <- cbind(go_through[-length(go_through)],c(go_through[-c(1,length(go_through)) ]-1,go_through [ length(go_through)])) + go_through + + pb <- txtProgressBar(min = 0, max = nrow( go_through), style = 3) + count_birthday <- function(s){ + #print(s) + setTxtProgressBar(pb, s) + data <- ssn[ go_through[s,1]:go_through[s,2],c("date_death_y","date_death_m","date_death_d", + "date_birth_y","date_birth_m","date_birth_d")] + date1=as.Date(paste(data$date_birth_y,"-",data$date_birth_m,"-",data$date_birth_d,sep=""),"%Y-%m-%d") + date2=as.Date(paste(data$date_death_y,"-",data$date_death_m,"-",data$date_death_d,sep=""),"%Y-%m-%d") + idx=which(!(is.na(date1)|is.na(date2))) + date1=date1[idx] + date2=date2[idx] + itg=try(age<-age_years(date1,date2),silent=TRUE) + if(inherits(itg, "try-error")) age=trunc((date2-date1)/365.25) + w=weekdays(date2) + T=table(age,w) + Tab=matrix(0,106,7) + for(i in 1:nrow(T)) if(as.numeric(rownames(T)[i])<106) Tab[as.numeric(rownames(T)[i]),]=T[i,] + return(Tab) + } + D <- lapply( seq_len(nrow( go_through)),count_birthday) + T=D[[1]] + for(s in 2:length(D)) T=T+D[[s]] + return(T) + }
If we run that function on the three files
> D1=TABLE_AGE_DAY("ssdm1") |========================================| 100% > D2=TABLE_AGE_DAY("ssdm2") |========================================| 100% > D3=TABLE_AGE_DAY("ssdm3") |========================================| 100%
we can visualize not percentages, as on the figure above, but counts
> D=D1+D2+D3 > colnames(D)= c("Sun","Thu","Mon","Tue","Wed","Sat","Fri") > D=D1[, c("Sun","Mon","Tue","Wed","Thu","Fri","Sat")]
and we have here (I remove the Saturday to get a better output)
> D[,1:6] Sun Mon Tue Wed Thu Fri [1,] 2843 2888 2943 3020 2979 3038 [2,] 2007 1866 1918 1974 1990 2137 [3,] 1613 1507 1532 1530 1515 1613 [4,] 1322 1256 1263 1259 1207 1330 [5,] 1155 1061 1092 1128 1112 1171 [6,] 1067 985 950 1082 1009 1055 [7,] 1129 901 915 954 941 1044 [8,] 1026 927 944 935 911 1005 [9,] 1029 1012 871 908 939 998 [10,] 1093 1011 974 958 928 1018 [11,] 1106 1031 1019 1036 1087 1122 [12,] 1289 1219 1176 1215 1141 1292 [13,] 1618 1455 1487 1484 1466 1633 [14,] 2121 2000 1900 1941 1845 2138 [15,] 2949 2647 2519 2499 2524 2748 [16,] 4488 3885 3798 3828 3747 4267 [17,] 5709 4612 4520 4422 4443 5005 [18,] 7280 5618 5400 5271 5344 5986 [19,] 8086 6172 5833 5820 6004 6628 [20,] 8389 6507 6166 6055 6430 6955 [21,] 8794 7038 6794 6628 6841 7572 [22,] 8578 6528 6512 6472 6757 7342 [23,] 8345 6750 6483 6469 6714 7338 [24,] 8361 6859 6589 6623 6854 7369 [25,] 8398 6974 6892 6766 6964 7613 [26,] 8432 7210 7012 7175 7343 7801 [27,] 8757 7641 7526 7352 7674 7950 [28,] 9190 8041 7843 7851 7940 8268 [29,] 9495 8409 8555 8400 8469 8934 [30,] 9876 9041 9015 9166 9106 9641 [31,] 10567 9952 9506 9634 9770 10212 [32,] 11417 10428 10402 10275 10455 11169 [33,] 11992 11306 11124 11095 11243 11749 [34,] 12665 12327 11760 12025 12137 12443 [35,] 13629 13135 13179 13037 12968 13724 [36,] 14560 14009 13927 13822 14105 14436 [37,] 15660 14990 15013 15009 15101 15700 [38,] 16749 16504 16148 16091 15912 16863 [39,] 17815 17760 17519 17144 17553 17943 [40,] 19366 19057 18918 18517 18760 19604 [41,] 20770 20458 20154 20339 20349 21238 [42,] 21962 22194 22020 21499 21690 22347 [43,] 23803 23922 23701 23681 23437 24227 [44,] 25685 26133 25559 25209 25287 26115 [45,] 27506 28110 27363 27042 27272 28228 [46,] 29366 29744 29555 29245 29678 30444 [47,] 31444 32193 31817 31504 31753 32302 [48,] 33452 34719 33529 33954 33441 34618 [49,] 36186 37150 36005 36064 36226 37138 [50,] 38401 39244 38813 38465 38506 39884 [51,] 40331 41830 41168 41110 40937 42014 [52,] 43181 44351 43975 43949 43579 44734 [53,] 45307 47134 46522 46149 46089 47286 [54,] 47996 49441 49139 48678 48629 49903 [55,] 50635 52424 51757 51433 51477 52550 [56,] 53509 55337 54556 54482 54406 55906 [57,] 55703 58482 58016 57400 57097 58758 [58,] 59016 61453 60652 61024 60557 62473 [59,] 62475 65651 64169 63824 63829 65592 [60,] 66621 69185 68885 68217 68752 69963 [61,] 69759 73144 72421 71784 71745 73414 [62,] 80346 84253 83044 83177 82416 83833 [63,] 86851 90059 89002 88985 89245 90334 [64,] 91839 95465 94602 93985 94154 96195 [65,] 98461 102846 101348 101328 101306 103170 [66,] 104569 108722 107768 107711 107729 109350 [67,] 111230 115477 114418 114743 113935 116356 [68,] 116999 122053 120727 120342 119782 122926 [69,] 123695 128339 127184 126822 126639 129037 [70,] 129956 136123 134555 135120 133842 137390 [71,] 137984 142964 141316 142855 141419 143620 [72,] 145132 150708 148407 149345 149448 151910 [73,] 152877 157993 155861 156349 155924 158725 [74,] 159109 164652 162722 163499 163157 165744 [75,] 165848 172121 170730 170482 170585 173431 [76,] 172457 179036 177185 177328 177392 180215 [77,] 179936 185015 183223 183932 183237 186663 [78,] 185900 191053 189986 189730 189639 193038 [79,] 191498 196694 194246 194810 195246 197812 [80,] 195505 201289 199684 199561 198968 203226 [81,] 199031 204927 202204 202622 202951 205792 [82,] 201589 207928 204929 204001 204396 208224 [83,] 201665 206743 205194 204676 205256 207980 [84,] 200965 205653 203422 202393 203422 206012 [85,] 197445 202692 199498 199730 200075 201728 [86,] 192324 195961 193589 194754 193800 196102 [87,] 183732 188063 185153 186104 186021 188176 [88,] 174258 177474 175822 176078 176761 177449 [89,] 163180 166706 162810 164367 164281 166436 [90,] 149169 151738 150148 150212 150535 152435 [91,] 134218 136866 134959 134922 135027 136381 [92,] 118936 121106 119591 119509 119793 120998 [93,] 102734 104955 102944 102865 103345 104776 [94,] 87418 88885 88023 86963 87546 87872 [95,] 72023 72698 72151 71579 71530 72287 [96,] 56985 58238 57478 57319 57163 57615 [97,] 44447 45058 44607 44469 43888 44868 [98,] 33457 34132 33022 33409 33454 33642 [99,] 24070 24317 24305 24089 24020 24383 [100,] 17165 17295 16755 17115 16957 17207 [101,] 11799 12125 11709 11816 11824 11719 [102,] 7714 7741 7959 7691 7648 7633 [103,] 5024 5012 4822 4792 4882 4916 [104,] 2987 3101 2978 3049 3093 2906 [105,] 1781 1894 1811 1756 1734 1834
So clearly, for young people, the number of deaths is rather small…
And to visualize it, as above, we can use
> P=D/apply(D,1,sum)*100 > range(P) [1] 12.34857 17.59386 > dP=trunc((P-min(P))/(max(P)+.01-min(P))*11) > library(RColorBrewer) > CLR=rev(brewer.pal(name="RdYlBu", 11)) > plot(0:1,0:1,ylim=c(55,110),xlim=c(-1,7)) > for(i in 1:106){ + for(j in 1:7){ + rect(j-1,108-i,j,107-i,col=CLR[dP[i,j]]) + }} > text(rep(-.5,106),107.5-1:106,0:105,cex=.4)
As above, we observe a strong difference among weekdays for the date of death for young people (below 30) which disappear after (even if there is still a sunday effect)
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.