Sex ratios in all countries from Human Mortality Database

[This article was first published on Ilya Kashnitsky, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Sex ratios reflect the two basic regularities of human demographics: 1) there are always more boys being born; 2) males experience higher mortality throughout their life-course. The sex ratio at birth does not vary dramatically1 and is more or less constant at the level of 105-106 boys per 100 girls. Hence, differences in the sex ratio profiles of countries mainly reflect gender gap in mortality. In this post I will compare sex ratios age profiles in all countries included in Human Mortality Database.

R gives amazing opportunities to grab data fast and easy. Thanks to Tim Riffe’s HMDPFDplus package, one can now download HMD data with just a couple of lines of R code.

There is a handy function in HMDPFDplus package – getHMDcountries(). It lists the codes for all countries in HMD. So it becomes really easy to loop through the database and download data for all countries.

<span class="c1"># load required packages
</span><span class="n">library</span><span class="p">(</span><span class="n">tidyverse</span><span class="p">)</span><span class="w"> </span><span class="c1"># version 1.0.0
</span><span class="n">library</span><span class="p">(</span><span class="n">HMDHFDplus</span><span class="p">)</span><span class="w"> </span><span class="c1"># version 1.1.8
</span><span class="w">
</span><span class="n">country</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">getHMDcountries</span><span class="p">()</span><span class="w">

</span><span class="n">exposures</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="nf">list</span><span class="p">()</span><span class="w">
</span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="n">i</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="m">1</span><span class="o">:</span><span class="w"> </span><span class="nf">length</span><span class="p">(</span><span class="n">country</span><span class="p">))</span><span class="w"> </span><span class="p">{</span><span class="w">
        </span><span class="n">cnt</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">country</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="w">
        </span><span class="n">exposures</span><span class="p">[[</span><span class="n">cnt</span><span class="p">]]</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">readHMDweb</span><span class="p">(</span><span class="n">cnt</span><span class="p">,</span><span class="w"> </span><span class="s2">"Exposures_1x1"</span><span class="p">,</span><span class="w">
                                       </span><span class="n">ik_user_hmd</span><span class="p">,</span><span class="w"> </span><span class="n">ik_pass_hmd</span><span class="p">)</span><span class="w">
        
        </span><span class="c1"># let's print the progress
</span><span class="w">        </span><span class="n">paste</span><span class="p">(</span><span class="n">i</span><span class="p">,</span><span class="s1">'out of'</span><span class="p">,</span><span class="nf">length</span><span class="p">(</span><span class="n">country</span><span class="p">))</span><span class="w"> 
</span><span class="p">}</span><span class="w">
</span>

Please note, the arguments ik_user_hmd and ik_pass_hmd are my login credidantials at the website of Human Mortality Database, which are stored locally at my computer. In order to access the data, one needs to create an account at http://www.mortality.org/ and provide his own credidantials to the readHMDweb() function.

Next, I select 2012 for comparison – it is quite recent, and for most of the HMD countries there are data for 2012. The loop goes through each of the countries’ dataframe in exposures list, selects data for 2012 and calculates sex ratio at each age. I also remove data for several populations (like East and West Germany separately).

<span class="n">sr_age</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="nf">list</span><span class="p">()</span><span class="w">

</span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="n">i</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="m">1</span><span class="o">:</span><span class="nf">length</span><span class="p">(</span><span class="n">exposures</span><span class="p">))</span><span class="w"> </span><span class="p">{</span><span class="w">
        </span><span class="n">di</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">exposures</span><span class="p">[[</span><span class="n">i</span><span class="p">]]</span><span class="w">
        </span><span class="n">sr_agei</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">di</span><span class="w"> </span><span class="o">%>%</span><span class="w"> </span><span class="n">select</span><span class="p">(</span><span class="n">Year</span><span class="p">,</span><span class="n">Age</span><span class="p">,</span><span class="n">Female</span><span class="p">,</span><span class="n">Male</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w"> 
                </span><span class="n">filter</span><span class="p">(</span><span class="n">Year</span><span class="w"> </span><span class="o">%in%</span><span class="w"> </span><span class="m">2012</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
                </span><span class="n">select</span><span class="p">(</span><span class="o">-</span><span class="n">Year</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
                </span><span class="n">transmute</span><span class="p">(</span><span class="n">country</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">names</span><span class="p">(</span><span class="n">exposures</span><span class="p">)[</span><span class="n">i</span><span class="p">],</span><span class="w">
                          </span><span class="n">age</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Age</span><span class="p">,</span><span class="w"> </span><span class="n">sr_age</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Male</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="n">Female</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="m">100</span><span class="p">)</span><span class="w">
        </span><span class="n">sr_age</span><span class="p">[[</span><span class="n">i</span><span class="p">]]</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">sr_agei</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="n">sr_age</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">bind_rows</span><span class="p">(</span><span class="n">sr_age</span><span class="p">)</span><span class="w">

</span><span class="c1"># remove optional populations
</span><span class="n">sr_age</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">sr_age</span><span class="w"> </span><span class="o">%>%</span><span class="w"> </span><span class="n">filter</span><span class="p">(</span><span class="o">!</span><span class="n">country</span><span class="w"> </span><span class="o">%in%</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="s2">"FRACNP"</span><span class="p">,</span><span class="s2">"DEUTE"</span><span class="p">,</span><span class="s2">"DEUTW"</span><span class="p">,</span><span class="s2">"GBRCENW"</span><span class="p">,</span><span class="s2">"GBR_NP"</span><span class="p">))</span><span class="w">
</span>

After age 90, sex ratios become quite jerky due to the relatively small numbers of survivors. I decided to aggregate data after the age 90.

<span class="c1"># summarize all ages older than 90 (too jerky)
</span><span class="n">sr_age_90</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">sr_age</span><span class="w"> </span><span class="o">%>%</span><span class="w"> </span><span class="n">filter</span><span class="p">(</span><span class="n">age</span><span class="w"> </span><span class="o">%in%</span><span class="w"> </span><span class="m">90</span><span class="o">:</span><span class="m">110</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w"> 
        </span><span class="n">group_by</span><span class="p">(</span><span class="n">country</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w"> </span><span class="n">summarise</span><span class="p">(</span><span class="n">sr_age</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">mean</span><span class="p">(</span><span class="n">sr_age</span><span class="p">,</span><span class="w"> </span><span class="n">na.rm</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">T</span><span class="p">))</span><span class="w"> </span><span class="o">%>%</span><span class="w">
        </span><span class="n">ungroup</span><span class="p">()</span><span class="w"> </span><span class="o">%>%</span><span class="w"> </span><span class="n">transmute</span><span class="p">(</span><span class="n">country</span><span class="p">,</span><span class="w"> </span><span class="n">age</span><span class="o">=</span><span class="m">90</span><span class="p">,</span><span class="w"> </span><span class="n">sr_age</span><span class="p">)</span><span class="w">

</span><span class="n">df_plot</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">bind_rows</span><span class="p">(</span><span class="n">sr_age</span><span class="w"> </span><span class="o">%>%</span><span class="w"> </span><span class="n">filter</span><span class="p">(</span><span class="o">!</span><span class="n">age</span><span class="w"> </span><span class="o">%in%</span><span class="w"> </span><span class="m">90</span><span class="o">:</span><span class="m">110</span><span class="p">),</span><span class="w"> </span><span class="n">sr_age_90</span><span class="p">)</span><span class="w">
</span>

Finally, I plot the resulting sex ratios.

<span class="c1"># get nice font
</span><span class="n">library</span><span class="p">(</span><span class="n">extrafont</span><span class="p">)</span><span class="w">
</span><span class="n">myfont</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="s2">"Roboto Condensed"</span><span class="w">

</span><span class="c1"># finaly - plot
</span><span class="n">gg</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">ggplot</span><span class="p">(</span><span class="n">df_plot</span><span class="p">,</span><span class="w"> </span><span class="n">aes</span><span class="p">(</span><span class="n">age</span><span class="p">,</span><span class="w"> </span><span class="n">sr_age</span><span class="p">,</span><span class="w"> </span><span class="n">color</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">country</span><span class="p">,</span><span class="w"> </span><span class="n">group</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">country</span><span class="p">))</span><span class="o">+</span><span class="w">
        </span><span class="n">geom_hline</span><span class="p">(</span><span class="n">yintercept</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">100</span><span class="p">,</span><span class="w"> </span><span class="n">color</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">'grey50'</span><span class="p">,</span><span class="w"> </span><span class="n">size</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">1</span><span class="p">)</span><span class="o">+</span><span class="w">
        </span><span class="n">geom_line</span><span class="p">(</span><span class="n">size</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">1</span><span class="p">)</span><span class="o">+</span><span class="w">
        </span><span class="n">scale_y_continuous</span><span class="p">(</span><span class="n">limits</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="m">0</span><span class="p">,</span><span class="w"> </span><span class="m">120</span><span class="p">),</span><span class="w"> </span><span class="n">expand</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="m">0</span><span class="p">,</span><span class="w"> </span><span class="m">0</span><span class="p">),</span><span class="w"> </span><span class="n">breaks</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">seq</span><span class="p">(</span><span class="m">0</span><span class="p">,</span><span class="w"> </span><span class="m">120</span><span class="p">,</span><span class="w"> </span><span class="m">20</span><span class="p">))</span><span class="o">+</span><span class="w">
        </span><span class="n">scale_x_continuous</span><span class="p">(</span><span class="n">limits</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="m">0</span><span class="p">,</span><span class="w"> </span><span class="m">90</span><span class="p">),</span><span class="w"> </span><span class="n">expand</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="m">0</span><span class="p">,</span><span class="w"> </span><span class="m">0</span><span class="p">),</span><span class="w"> </span><span class="n">breaks</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">seq</span><span class="p">(</span><span class="m">0</span><span class="p">,</span><span class="w"> </span><span class="m">80</span><span class="p">,</span><span class="w"> </span><span class="m">20</span><span class="p">))</span><span class="o">+</span><span class="w">
        </span><span class="n">xlab</span><span class="p">(</span><span class="s1">'Age'</span><span class="p">)</span><span class="o">+</span><span class="w">
        </span><span class="n">ylab</span><span class="p">(</span><span class="s1">'Sex ratio, males per 100 females'</span><span class="p">)</span><span class="o">+</span><span class="w">
        </span><span class="n">facet_wrap</span><span class="p">(</span><span class="o">~</span><span class="n">country</span><span class="p">,</span><span class="w"> </span><span class="n">ncol</span><span class="o">=</span><span class="m">6</span><span class="p">)</span><span class="o">+</span><span class="w">
        </span><span class="n">theme_minimal</span><span class="p">(</span><span class="n">base_family</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">myfont</span><span class="p">,</span><span class="w"> </span><span class="n">base_size</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">15</span><span class="p">)</span><span class="o">+</span><span class="w">
        </span><span class="n">theme</span><span class="p">(</span><span class="n">legend.position</span><span class="o">=</span><span class="s1">'none'</span><span class="p">,</span><span class="w">
              </span><span class="n">panel.border</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">element_rect</span><span class="p">(</span><span class="n">size</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">.5</span><span class="p">,</span><span class="w"> </span><span class="n">fill</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">NA</span><span class="p">))</span><span class="w">
</span>

gg

There is quite a variety in the sex ratio profiles. If the initial prevalence of males equalizes in Japan, Sweden, or Norway at around 60, in Russia, Belarus, and Ukraine this happens at around 30 due to very high male mortality. In many countries there are pronounced bumps in the sex ratio at ages 20-30, that are likely to be caused by international migration. For example, Scotland, Northern Ireland, Portugal, and New Zealand are experiencing substantial outflow of young men.

What happened in Taiwan?

This post is based on my earlier twit and gist.

  1. There are cases of big deviations from this natural constant. The most well known one is the skewed sex ratio in China, where decades of One Child Policy together with strong traditional son preference resulted in selective abortions. Read more: Frejka et. al (2010); Wang (2011); Basten and Verropoulou (2013). 

To leave a comment for the author, please follow the link and comment on their blog: Ilya Kashnitsky.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)