Twins on the up

[This article was first published on HighlandR, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Are multiple births on the increase?

My twin boys turned 5 years old today. Wow, time flies. Life is never dull, because twins are still seen as something of a novelty, so wherever we go, we find ourselves in conversation with strangers, who are intrigued by the whole thing.

In order to save time if we ever meet, here’s some FAQ’s:

  • No, they’re not identical
  • Yes, I’m sure
  • No, they do not have similar personalities
  • They like different things – One likes Hulk and Gekko, the other likes Iron Man and Catboy.

Recently I’ve been hearing and seeing anecdotal evidence that twins and multiple births are on the increase. I tried to find some data for Scotland, and while there is a lot of information on births in Scotland available, I couldn’t find breakdowns of multiple births. However, I did find some information for England and Wales, so let’s look at that.

In this next bit, they key thing that may be of interest is the use of tidyr::gather. There has been some discussion on #rstats Twitter about things people struggle with and a surprising amount of people struggle to remember the syntax for tidyr’s gather and spread.

(I can neither confirm or deny I am one of them).

The data was found here


data <- read_xls("birthcharacteristicsworkbook2016.xls", sheet = "Table 11",
                 range = "A10:I87")

data <- data %>% 
  rename(Year = X__1,
         All_ages = `All maternities with multiple births`,
         Under20  = X__2,
         `20_to_24` = X__3,
         `25_to_29` = X__4,
         `30_to_34` = X__5,
         `35_to_39` = X__6,
         `40_to_44` = X__7,
         `45_and_over` = X__8)

# the 1981 data is borked, so ignore that

Note use of gather to combine all the age groups into an age_group variable.

We use the Year column as an index so we have an entry for every age group, for every year, with the value represented as ‘maternities’.

Back to the code:

long_data <- data %>% 
  filter(Year != "1981") %>% 
  gather(key = age_group, value = "maternities", -Year)

long_data$Year <- as.numeric(long_data$Year)
long_data$age_group <- forcats::as_factor(long_data$age_group)
long_data$maternities <- as.numeric(long_data$maternities)

ggplot(long_data,aes(Year, maternities), group = age_group) +
  geom_line() +
  geom_point() +
  facet_wrap(vars(age_group), scales = "free_y") +
  ggtitle(label = "England and Wales maternities with multiple births - numbers",
          subtitle = "By age of mother, 1940 to 2016") +
  labs(x = NULL, y = "Multiple maternities")



# Let's do rates

rates <- read_xls("birthcharacteristicsworkbook2016.xls", sheet = "Table 11",
                 range = "A89:I166")

rates <- rates %>% 
  rename(Year = X__1,
         All_ages = `All maternities with multiple births per 1,000 all maternities`,
         Under20  = X__2,
         `20_to_24` = X__3,
         `25_to_29` = X__4,
         `30_to_34` = X__5,
         `35_to_39` = X__6,
         `40_to_44` = X__7,
         `45_and_over` = X__8)

long_rates <- rates %>% 
  filter(Year != 1981) %>% 
  gather(key = age_group, value = "multiple_maternities_per_1000", -Year)

long_rates$Year <- as.numeric(long_rates$Year)
long_rates$age_group <- forcats::as_factor(long_rates$age_group)
long_rates$multiple_maternities_per_1000 <- as.numeric(long_rates$multiple_maternities_per_1000)

ggplot(long_rates,aes(Year, multiple_maternities_per_1000), group = age_group) +
  geom_line() +
  geom_point() +
  facet_wrap(vars(age_group)) +
  ggtitle(label = "England and Wales Rate of maternities with multiple births  - per 1,000 all maternities ",
          subtitle = "By age of mother, 1940 to 2016") +
  labs(x = NULL, y = "Multiple maternities")

When we look at maternities with multiple births as a rate per 1000 maternities, we see the increase in multiple births among older mothers, especially in the over 45 group. 2018-12-13-rates-common.png

Again, with free scales on the y axis - which helps us see almost all age groups are exhibiting an increase - compare the 20-24 age group as a rate and as count for example.


Looks to me that overall, the rate of multiple births is increasing. What’s driving this? Can it continue? Will people ever stop asking us if the twins are identical?


To leave a comment for the author, please follow the link and comment on their blog: HighlandR. offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)