covidcast package for COVID-19-related data
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
(This is a PSA post, where I share a package that I think that might be of interest to the community but I haven’t looked too deeply into myself.)
Today I learnt of the covidcast
R package, which provides access to the COVIDcast Epidata API published by the Delphi group at Carnegie Mellon University. According to the covidcast
R package website,
This API provides daily access to a range of COVID-related signals Delphi that builds and maintains, from sources like symptom surveys and medical claims data, and also standard signals that we simply mirror, like confirmed cases and deaths.
(There is a corresponding python package with similar functionality.) The Delphi group has done a huge amount of work in logging a wide variety of COVID-related data and making it available, along with tools to visualize and make sense of the data.
For those interested in doing COVID-related analyses, I think this is a treasure trove of information for you to use. The covidcast
package contains several different types of data (which they call “signals”), including public behavior (e.g. COVID searches on Google), early indicators (e.g. COVID-related doctor visits) and late indicators (e.g. deaths). Documentation on the signals available can be found here. (Note: The data is US-focused right now; I don’t know if there are plans to include data from other countries.)
Let me end off with a simple example showing what you can do with this package. This example is modified from one of the package vignettes; see the Articles section of the package website for more examples.
The package is not available on CRAN yet but can be downloaded from Github:
devtools::install_github("cmu-delphi/covidcast", ref = "main", subdir = "R-packages/covidcast")
The code below pulls data on cumulative COVID cases per 100k people on 2020-12-31 at the county level. covidcast_signal
is the function to use for pulling data, and it returns an object of class c("covidcast_signal", "data.frame")
.
library(covidcast) # Cumulative COVID cases per 100k people on 2020-12-31 df <- covidcast_signal(data_source = "usa-facts", signal = "confirmed_cumulative_prop", start_day = "2020-12-31", end_day = "2020-12-31") summary(df) # A `covidcast_signal` data frame with 3142 rows and 9 columns. # # data_source : usa-facts # signal : confirmed_cumulative_prop # geo_type : county # # first date : 2020-12-31 # last date : 2020-12-31 # median number of geo_values per day : 3142
There is a plot
method for calss covidcast_signal
objects:
plot(df)
The automatic plot is usually not bad. The plot
method comes with some arguments that the user can use to customize the plot (full documentation here):
breaks <- c(0, 500, 1000, 5000, 10000) colors <- c("#D3D3D3", "#FEDDA2", "#FD9950", "#C74E32", "#800026") plot(df, choro_col = colors, choro_params = list(breaks = breaks), title = "Cumulative COVID cases per 100k people on 2020-12-31")
The plot returned is actually created using the ggplot2
package, so it is possible to add your own ggplot2
code on top of it:
library(ggplot2) plot(df, choro_col = colors, choro_params = list(breaks = breaks), title = "Cumulative COVID cases per 100k people on 2020-12-31") + theme(title = element_text(face = "bold"))
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.