Taking a Shot at cdcfluview v0.7.0 (a.k.a. The Dangers of Relying on ‘Hidden’ APIs)

[This article was first published on R – rud.is, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Unlike @noamross, I am not an epidemiologist (NOTE: Noam battles pandemics before breakfast, so be super nice to him) but I do like to find kindred methodologies in other disciplines to help foster the growth of cybersecurity into something beyond it’s current Barnum & Bailey state. I also love finding and exposing hidden APIs and especially enjoy killing Adobe Flash. How does all that relate to cdcfluview?

cdcfluview? was one of my first R packages. Someone, somwewhere, was trying to do something with Selenium to automate downloading of data from the CDC’s FluView Portal. It was — and, some of it still is — a Flash-based site that locked up useful data behind application screens that did little more than burn ones retinas and force folks to keep Flash alive and, hence, their browsers insecure.

Rather than let the requester suffer under the weight of a pretty significant external dependency, I used the magic of the browser “developer tools” inspector to see that it was making fairly innocuous and useful XHR requests for real data. The package sat on GitHub for a while and eventually made its way to CRAN.

Times change and Flash is dying, so the CDC paid some serious benjamins to have the site re-done in HTML, replicating the horrible UX and terrible visualizations (so. many. pie. charts.). Said revamp also caused changes to the back-end APIs and forced breaking changes. Craig McGowan jumped to the rescue and fixed some core functionality issues, but so much changed — and so much was added — that I felt it was time for a modern re-write of the cdcfluview package.

This is a pretty solid, real-world example of how dangerous it is to rely on hidden APIs. If Craig hadn’t both notified me and gone the extra mile to make a PR, I’d’ve been in the dark until I tried to commiserate (I always seem get the flu no matter what I do) with code and found my package erroring out.

Enter: cdcfluview 0.7.0.

What’s Different?

Unfortunately, everything; which is one reason I’m writing this post.

First, to have folks that are using current-gen cdcfluview kick the tyres and let me know (via issues) if you need any old API compatibility back. This isn’t anywhere near the most popular package on CRAN, but it does have users (even, I’m told, within the CDC) and I want to make sure I do as little to disrupt them as possible. But, the current package API maps much more closely to the way the revamped portal works and presents data, so I’m hoping it’s a good net-new vs crushing blow to productivity.

Speaking of maps, the package now has actual maps! A new cdc_basemap() function returns the GeoJSON files that the CDC uses in their web views as sf objects. And, there are tons of maps and multi-labeled features to tie data to:

Here’s what’s in the tin:

  • age_group_distribution: Age Group Distribution of Influenza Positive Tests Reported by Public Health Laboratories
  • cdc_basemap: Retrieve CDC U.S. Basemaps
  • geographic_spread: State and Territorial Epidemiologists Reports of Geographic Spread of Influenza
  • hospitalizations: Laboratory-Confirmed Influenza Hospitalizations
  • ilinet: Retrieve ILINet Surveillance Data
  • ili_weekly_activity_indicators: Retrieve weekly state-level ILI indicators per-state for a given season
  • pi_mortality: Pneumonia and Influenza Mortality Surveillance
  • state_data_providers: Retrieve metadata about U.S. State CDC Provider Data
  • surveillance_areas: Retrieve a list of valid sub-regions for each surveillance area.
  • who_nrevss: Retrieve WHO/NREVSS Surveillance Data
  • mmwr_week: Convert a Date to an MMWR day+week+year
  • mmwr_weekday: Convert a Date to an MMWR weekday
  • mmwr_week_to_date: Convert an MMWR year+week or year+week+day to a Date object

Plus there’s a new data object mmwrid_map that makes it super-easy to convert arcane MMWR identifiers to real date object.

The README has plenty of charts and examples, so I won’t take up post-space with said code or images.

Curiously Enough

Along the way, I was able to discern that there’s a hidden layer of this new, hidden API. Exposing said layer should be as easy as figuring out the right keyword and I’m hoping a bit of fuzzing will do the trick on that. It will be interesting to see what extra data that unlocks. (Yes, I just said relying on hidden APIs is dangerous; and, relying on hidden, hidden APIs is doubly so. I’m just a glutton for punishment.)

I was also able to discern that multiple people or teams worked on this revamp and said folks did not communicate with each other. The per-app APIs are woefully inconsistent. Furthermore, someone goofed and forgot to expose some pretty critical information from a few data retrieval operations (said data is also missing on the clickable download versions, too). Hopefully they’ll be addressing the issue soon (the site is technically in beta release).

FIN

If you’ve been a user of cdcfluview please give the new API a try and file issues with anything you see. All contributors — testers, modders, enhancers — will get full DESCRIPTION credit (so, please also include how you’d like to be cited).

Finally, please do check out the CDC FluView Portal. It’s gosh awful horribad. I know there are some spiffy Shiny experts out there who could run rings around that portal and I’ll be glad to add you as a collaborator if you contribute a Shiny app (or two!) to the package. If you’d rather go your own route with a self-contained, self-published package, just let me know what API changes you’d like and I’ll gladly accommodate. The goal is to help epidemiologists and other researchers keep us all safe.

So, go get your flu shot!!! Then, kick the tyres on this package update and don’t hesitate to convey your criticisms, patches or accolades.

Now to get to my promised final review of cyphr (I’ve not forgotten @ma_salmon 😉

To leave a comment for the author, please follow the link and comment on their blog: R – rud.is.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)