Last night at the Bay Area UseR Group meeting, Peter Aldhous, San Francisco Bureau Chief of New Scientist Magazine, gave an inspiring presentation about Data Driven Journalism. Even though the newspaper industry is faltering as a business model, there's a beacon of light: journalists can be the driving force behind bringing the meaning in the huge data sets that are now available to a wider audience. In an age of data, this is what journalism could become. Even the inventor of the World Wide Web, Time Berners-Lee, says that analyzing data is the future for journalists.
In his talk, Peter gave several examples of data journalism in print and online media. I was surprised to learn that the pioneering example comes from not the last few years but the last few decades: the Pulitzer-prize winning reportage on the 1976 Detroit riot by Philip Meyer. There, a well-designed followup survey dispelled some myths about the rioters, for example that college graduates were as likely to have rioted as high-school dropouts. Some more recent examples presented by Peter included a Guardian investigation of the Afghanistan Wikileaks data dump, and a Seattle Times investigation into the relationship between clear-cutting and devastating landslides.
Several examples were implemented using R, at least in part. Through conversations with New York Times Graphics Editor Amanda Cox, Peter confirmed that indeed several of the interactive graphics featured in the presentation by Amanda we noted last week involved R, including the Michael Jackson billboard rankings chart, the Mariano Rivera baseball story, and a decision tree (created using the rpart package) on primary voters in the 2008 Obama-Clinton race. In Barron's in 2007, a feature article revealing that the only way to profit from the advice of CNBC financial “guru” Jim Cramer was to short his recommendations was based on financial analysis in R, as described by journalist Bill Alpert in an R News article (see p34) and by his statistical adviser Patrick Burns (of R Inferno fame). Peter also described his own use of R for data journalism in New Scientist, which we reviewed in an earlier post.
Peter has graciously made his slides available for download — check them out and follow the links to many of the examples mentioned above. Video of Peter's presentation will also be available soon (check back here for an update with the link).
Bay Area UseR Group: Data Driven Journalism