New Package — {cdccovidview} — To Work with the U.S. CDC’s New COVID-19 Trackers: COVIDView and COVID-NET
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
The United States Centers for Disease Control (CDC from now on) has setup two new public surveillance resources for COVID-19. Together, COVIDView and COVID-NET provide similar weekly surveillance data as FluView does for influenza-like illnesses (ILI).
The COVIDView resources are HTML tables (O_O) and, while the COVID-NET interface provides a “download” button, there is no exposed API to make it easier for the epidemiological community to work with these datasets.
Enter {cdccovidview} — https://cinc.rud.is/web/packages/cdccovidview/ — which scrapes the tables and uses the hidden API in the same way {cdcfluview}(https://cran.rstudio.com/web/packages/cdcfluview/index.html) does for the FluView data.
Weekly case, hospitalization, and mortality data is available at the national, state and regional levels (where provided) and I tried to normalize the fields across each of the tables/datasets (I hate to pick on them when they’re down, but these two sites are seriously sub-optimal from a UX and just general usage perspective).
After you follow the above URL for information on how to install the package, it should “just work”. No API keys are needed, but the CDC may change the layout of tables and fields structure of the hidden API at any time, so keep an eye out for updates.
Using it is pretty simple, just use one of the functions to grab the data you want and then work with it.
library(cdccovidview) library(hrbrthemes) library(tidyverse) hosp <- laboratory_confirmed_hospitalizations() hosp ## # A tibble: 4,590 x 8 ## catchment network year mmwr_year mmwr_week age_category cumulative_rate weekly_rate ## <chr> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> ## 1 Entire Network COVID-NET 2020 2020 10 0-4 yr 0 0 ## 2 Entire Network COVID-NET 2020 2020 11 0-4 yr 0 0 ## 3 Entire Network COVID-NET 2020 2020 12 0-4 yr 0 0 ## 4 Entire Network COVID-NET 2020 2020 13 0-4 yr 0.3 0.3 ## 5 Entire Network COVID-NET 2020 2020 14 0-4 yr 0.6 0.3 ## 6 Entire Network COVID-NET 2020 2020 15 0-4 yr NA NA ## 7 Entire Network COVID-NET 2020 2020 16 0-4 yr NA NA ## 8 Entire Network COVID-NET 2020 2020 17 0-4 yr NA NA ## 9 Entire Network COVID-NET 2020 2020 18 0-4 yr NA NA ## 10 Entire Network COVID-NET 2020 2020 19 0-4 yr NA NA ## # … with 4,580 more rows c( "0-4 yr", "5-17 yr", "18-49 yr", "50-64 yr", "65+ yr", "65-74 yr", "75-84 yr", "85+" ) -> age_f mutate(hosp, start = mmwr_week_to_date(mmwr_year, mmwr_week)) %>% filter(!is.na(weekly_rate)) %>% filter(catchment == "Entire Network") %>% select(start, network, age_category, weekly_rate) %>% filter(age_category != "Overall") %>% mutate(age_category = factor(age_category, levels = age_f)) %>% ggplot() + geom_line( aes(start, weekly_rate) ) + scale_x_date( date_breaks = "2 weeks", date_labels = "%b\n%d" ) + facet_grid(network~age_category) + labs( x = NULL, y = "Rates per 100,000 pop", title = "COVID-NET Weekly Rates by Network and Age Group", caption = sprintf("Source: COVID-NET: COVID-19-Associated Hospitalization Surveillance Network, Centers for Disease Control and Prevention.\n<https://gis.cdc.gov/grasp/COVIDNet/COVID19_3.html>; Accessed on %s", Sys.Date()) ) + theme_ipsum_es(grid="XY")
FIN
This is brand new and — as noted — things may change or break due to CDC site changes. I may have also missed a table or two (it’s a truly terrible site).
If you notice things are missing or would like a different interface to various data endpoints, drop an issue or PR wherever you’re most comfortable.
Stay safe!
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.