If, like me, you've ever had a sandwich from a dubious deli and then been laid up for days afterwards, you know that food poisoning is no trifling matter. In the past, local authorities would only ever learn of such public health issues if they get reported to the authorities by the victim (or the victim's doctor). But that misses the many cases of less serious illnesses that don't involve a doctor or hospital, or illnesses that simply aren't reported to the authorities.
Now, the City of Chicago has found a new way of identifying sources of food poisoning: by analyzing tweets. Foodborne Chicago scans tweets posted in the Chicagoland area, responding to tweets like: "Stomach flu/food poisoning is like eating gas station sushi without the joys of eating gas station sushi" (but ignoring tweets like "It’s really hard to snack while watching Honey Boo Boo. It’s the second best diet to food poisoning."). If you send a such a tweet, you're likely to get a response:
— Foodborne Chicago (@foodbornechi) April 16, 2013
Foodborne searches Twitter for all tweets near Chicago containing the string “food poisoning”. The ingestion service consumes thousands of tweets, storing them in a large MongoDB instance. A collection of classification servers, running R, churn through the collected tweets, applying a series of filters. The tweets are classified using a model that was trained via supervised learning, which determines if the tweets are related to a food poisoning illness or not.
Cory Nissen, the data scientist who implemented the analysis behind the system, shared some of the behind-the-scenes details with me via email. He used an R package called textcat and an algorithm based on n-grams to classify the tweets. The model is trained in such a way as to bias towards sensitivity (at the 90%+ level) at the expense of specificity (50 – 60%) to better sort true food poisoning reports from "junk" tweets merely about food poisoning. Out of all the tweets in the Chigaco area on any given day, the system flags about 10-20 tweets a day for review, of which just a couple will typically warrant a response to the unwell citizen for followup.
The open-source R code behind the classifier is available on Github. Check out the README file for more technical details behind the implementation. You can also see how the application was presented on Fox 39 Chicago news (starting at the 2:09 mark):
Smart Chicago Collaborative: Foodborne Chicago: Behind the Scenes