Forecasting tools in development

[This article was first published on - R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

As I’ve been writing up a progress report for my NIGMS R35 MIRA award, I’ve been reminded at how much of the work that we’ve been doing is focused on forecasting infrastructure. A common theme in the Reich Lab is making operational forecasts of infectious disease outbreaks. The operational aspect means that we focus on everything from developing and adapting statistical methods to be used in forecasting applications to thinking about the data science toolkit that you need to store, evaluate, and visualize forecasts. To that end, in addition to working closely with the CDC in their FluSight initiative, we’ve been doing a lot of collaborative work on new R packages and data repositories that I hope will be useful beyond the confines of our lab. Some of these projects are fully operational, used in our production flu forecasts for CDC, and some have even gone through a level of code peer review. Others are in earlier stages of development. My hope is that in putting this list out there (see below the fold) we will generate some interest (and possibly find some new open-source collaborators) for these projects.

Here is a partial list of in-progress software that we’ve been working on:

  • sarimaTD is an R package that serves as a wrapper to some of the ARIMA modeling functionality in the forecast R package. We found that we consistently wanted to be specifying some transformations (T) and differencing (D) in specific ways that we have found useful in modeling infectious disease time-series data, so we made it easy for us and others to use specifications.
  • ForecastFramework is an R package that we have collaborated on with our colleagues at the Johns Hopkins Infectious Disease Dynamics lab. We’ve blogged about this before, and we see a lot of potential in this object-oriented framework for both standardizing how datasets are specified/accessed and how models are generated. That said, there still is a long ways to go to document and make this infrastructure usable by a wide audience. The most success I’ve had using it so far was having PhD students write forecast models for a seminar I taught this spring. I used a single script that could run and score forecasts from each model, with a very simple plug-and-play interface to the models because they had been specified appropriately.
  • Zoltar is a new repository (in alpha-ish release right now) for forecasts that we have been working on over the last year. It was initially designed with our CDC flu forecast use-case in mind, although the forecast structure is quite general, and with predx integration on the way (see next bullet) we are hoping that this will broaden the scope of possible use cases for Zoltar. To help facilitate our and others use of Zoltar, we are working on two interfaces to the Zoltar API, zoltpy for python and zoltr for R. Check out the documentation, especially for zoltr. There is quite a bit of data available!
  • predx is an R package designed my colleague and friend Michael Johansson of the US Centers for Disease Control and Prevention and OutbreakScience. Evan Ray, from the Reich Lab team, has contributed to it as well. The goal of predx is to define some general classes of data for both probabilistic and point forecasts, to better standardize ways that we might want to store and operate on these data.
  • d3-foresight is the main engine behind our interactive forecast visualizations for flu in the US. We have also integrated it with Zoltar, so that you can view forecasts stored in Zoltar (note, kind of a long load time for that last link) using some of the basic d3-foresight functionality.

The lion’s share of the credit for all of the above are due to some combination of Matthew Cornell, Abhinav Tushar, Katie House, and Evan Ray.

To leave a comment for the author, please follow the link and comment on their blog: - R. offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)