Removing Personal Bias From Flu Severity Estimation (a.k.a. Misery Loves Data)

January 2, 2017
By

(This article was first published on R – rud.is, and kindly contributed to R-bloggers)

The family got hit pretty hard with the flu right as the Christmas festivities started and we were all pretty much bed-ridden zombies up until today (2017-01-02). When in the throes of a very bad ILI it’s easy to imagine that you’re a victim of a severe outbreak, especially with ancillary data from others that they, too, either just had/have the flu or know others who do. Thankfully, I didn’t have to accept this emotional opine and could turn to the cdcfluview package to see just how this year is measuring up.

Influenza cases are cyclical, and that’s super-easy to see with a longer-view of the CDC national data:

library(cdcfluview)
library(tidyverse)
library(stringi)

flu <- get_flu_data("national", years=2010:2016)

mutate(flu, week=as.Date(sprintf("%s %s 1", YEAR, WEEK), format="%Y %U %u")) %>% 
  select(-`AGE 25-64`) %>% 
  gather(age_group, count, starts_with("AGE")) %>% 
  mutate(age_group=stri_replace_all_fixed(age_group, "AGE", "Ages")) %>% 
  mutate(age_group=factor(age_group, levels=c("Ages 0-4", "Ages 5-24", "Ages 25-49", "Ages 50-64", "Ages 65"))) %>%
  ggplot(aes(week, count, group=age_group)) +
  geom_line(aes(color=age_group)) +
  scale_y_continuous(label=scales::comma, limits=c(0,20000)) +
  facet_wrap(~age_group, scales="free") +
  labs(x=NULL, y="Count of reported ILI cases", 
       title="U.S. National ILI Case Counts by Age Group (2010:2011 flu season through 2016:2017)",
       caption="Source: CDC ILInet via CRAN cdcfluview pacakge") +
  ggthemes::scale_color_tableau(name=NULL) +
  hrbrmisc::theme_hrbrmstr(grid="XY") +
  theme(legend.position="none")

We can use the same data to zoom in on this season:

mutate(flu, week=as.Date(sprintf("%s %s 1", YEAR, WEEK), format="%Y %U %u")) %>% 
  select(-`AGE 25-64`) %>% 
  gather(age_group, count, starts_with("AGE")) %>% 
  mutate(age_group=stri_replace_all_fixed(age_group, "AGE", "Ages")) %>% 
  mutate(age_group=factor(age_group, levels=c("Ages 0-4", "Ages 5-24", "Ages 25-49", "Ages 50-64", "Ages 65"))) %>%
  filter(week >= as.Date("2016-07-01")) %>% 
  ggplot(aes(week, count, group=age_group)) +
  geom_line(aes(color=age_group)) +
  scale_y_continuous(label=scales::comma, limits=c(0,20000)) +
  facet_wrap(~age_group, scales="free") +
  labs(x=NULL, y="Count of reported ILI cases", 
       title="U.S. National ILI Case Counts by Age Group (2016:2017 flu season)",
       caption="Source: CDC ILInet via CRAN cdcfluview pacakge") +
  ggthemes::scale_color_tableau(name=NULL) +
  hrbrmisc::theme_hrbrmstr(grid="XY") +
  theme(legend.position="none")

So, things are trending up, but how severe is this year compared to others? While looking at the number/percentage of ILI cases is one way to understand severity, another is to look at the mortality rate. The cdcfluview package has a get_mortality_surveillance_data() function, but it’s region-based and I’m really only looking at national data in this post. A helpful individual pointed out a new CSV file at https://www.cdc.gov/flu/weekly/index.htm#MS which we can reproducibly programmatically target (so we don’t have to track filename changes by hand) with:

library(rvest)

pg <- read_html("https://www.cdc.gov/flu/weekly/index.htm#MS")
html_nodes(pg, xpath=".//a[contains(@href, 'csv') and contains(@href, 'NCHS')]") %>% 
  html_attr("href") -> mort_ref
mort_url <- sprintf("https://www.cdc.gov%s", mort_ref)

df <- readr::read_csv(mort_url)

We can, then, take a look at the current “outbreak” status (when real-world mortality events exceed the model threshold):

mutate(df, week=as.Date(sprintf("%s %s 1", Year, Week), format="%Y %U %u")) %>% 
  select(week, Expected, Threshold, `Percent of Deaths Due to Pneumonia and Influenza`) %>% 
  gather(category, percent, -week) %>% 
  mutate(percent=percent/100) %>% 
  ggplot() +
  geom_line(aes(week, percent, group=category, color=category)) +
  scale_x_date(date_labels="%Y-%U") +
  scale_y_continuous(label=scales::percent) +
  ggthemes::scale_color_tableau(name=NULL) +
  labs(x=NULL, y=NULL, title="U.S. Pneumonia & Influenza Mortality",
       subtitle="Data through week ending December 10, 2016 as of December 28, 2016",
       caption="Source: National Center for Health Statistics Mortality Surveillance System") +
  hrbrmisc::theme_hrbrmstr(grid="XY") +
  theme(legend.position="bottom")

That view is for all mortality events from both influenza and pneumonia. We can look at the counts for just influenza as well:

mutate(df, week=as.Date(sprintf("%s %s 1", Year, Week), format="%Y %U %u")) %>% 
  select(week, `Influenza Deaths`) %>% 
  ggplot() +
  geom_line(aes(week, `Influenza Deaths`), color=ggthemes::tableau_color_pal()(1)) +
  scale_x_date(date_labels="%Y-%U") +
  scale_y_continuous(label=scales::comma) +
  ggthemes::scale_color_tableau(name=NULL) +
  labs(x=NULL, y=NULL, title="U.S. Influenza Mortality (count of mortality events)",
       subtitle="Data through week ending December 10, 2016 as of December 28, 2016",
       caption="Source: National Center for Health Statistics Mortality Surveillance System") +
  hrbrmisc::theme_hrbrmstr(grid="XY") +
  theme(legend.position="bottom")

It’s encouraging that the overall combined mortality rate is trending downwards and that the mortality rate for influenza is very low. Go. Science.

I’ll be adding a function to cdcfluview to retrieve this new data set a bit later in the year.

Hopefully you’ll avoid the flu and enjoy a healthy and prosperous 2017.

To leave a comment for the author, please follow the link and comment on their blog: R – rud.is.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)