The City of Chicago uses R to issue beach safety alerts

[This article was first published on Revolutions, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Among the many interesting talks I saw a the Domino Data Science Pop-Up in Chicago earlier this week was the presentation by Gene Lynes and Nick Lucius from the City of Chicago. The City of Chicago Tech Plan encourages smart communities and open government, and as part of that initiative the city has undertaken dozens of open-source, open-data projects in areas such as food safety inspections, preventing the spread of West Nile virus, and keeping sidewalks clear of snow. 

This talk was on the Clear Water initiative, a project to monitor the water quality of Chicago's many public beaches on Lake Michigan, and to issue safety alerts (or in serious cases, beach closures) when E Coli levels in the water get too high. The problem is that E Coli levels can change rapidly: water levels can be normal for weeks, and then spike for a single day. But traditional culture tests take many days to return results, and while rapid DNA-based tests do exist, it's not possible conduct these tests daily at every beach.

The solution is to build a predictive model, which uses meteorological data and rapid DNA tests for some beaches, combined with historical (culture-based) evaluations of water quality, to predict E Coli levels at all beaches every day. The analysis is performed using R (you can find the R code at this Github repository).

The analysis was developed in conjunction with citizen scientists at Chi Hack Night and statisticians from DePaul University. In 2017, the model was piloted in production in Chicago to issue beach safety alerts and to create a live map of beach water quality. This new R-based model predicted 60 additional occurrences of poor water quality, compared with the process used in prior years.

Still, water quality is hard to predict: once you have the slower test data and an actual result to compare with, that's an accuracy rate of 38%, with fewer than 2% false alarms. (The city plans to use clustering techniques to further improve that number.) That model uses rapid DNA testing at five beaches to predict all beaches along Lake Michigan. A Shiny app (linked below) lets you explore the impact of testing at a different set of beaches, and adjusting the false positive rate, on the overall accuracy of the model.

Chicago Beaches

You can find more details about the City of Chicago Clear Water initiative at the link below.

City of Chicago: Clear Water

To leave a comment for the author, please follow the link and comment on their blog: Revolutions. offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)