At the Information Management blog, Steve Miller has provided two great reviews (here and here) of last week’s Predictive Analytics World conference, including a recap of the Bay Area User’s Group meeting featuring John Chambers. (My personal highlight from John’s talk? A photograph of the very first sketch of what was to become the S system, which ultimately begat the R language.) But I also wanted to point you to Steve’s review of Mike Driscoll’s jaw-dropping talk The Social Effect: Predicting Telecom Customer Churn with Call Data:
Mike Driscoll’s: The Social Effect: Predicting Telecom Customer Churn with Call Data, was a good illustration of predictive analytics in a larger data warehousing and BI context. Driscoll and his team analyzed billions of calls, millions of records and thousands of defectors from a Greenplum DW looking for predictors of churn. Driscoll’s a big proponent of the open source R Project for Statistical Computing to support his work flow of data munge, data model, and data visualize. And with a Ph.D. in Bioinformatics, he often thinks like an epidemiologist, in this case looking for indications of contagious churn behavior. Using several social network analysis packages available in R, Driscoll’s team appears to have found that churn in an individual’s social network of calling accounts in a given month is likely to lead to more churn in subsequent months, a clear indication of a network effect. That contagion is overwhelmingly the strongest signal the team found from the data. A next step is to work with marketing to intervene on early network churn with email campaigns to minimize losses from the affected networks. I’d love to see the results from those experiments.
Mike’s talk was jaw-dropping for me for two reasons. Firstly, in innovation: using a caller’s social network (defined by the people they call the most, information drawn from the call-log data) was an elegent and powerful way to predict probability of switching to another network. It seems intuitive — if your friends switch to another provider, you’re likely to, as well — and it’s always good to see intuition borne out by data. And secondly, in scale: we’re talking about 10Gb of data here, analyzed in just a few hours using R. Not too long ago that type of computation would be reserved for high-tech computing grids and expensive software; these days, it’s amazing what you can do with a hefty 64-bit workstation and open-source tools. Unfortunately Mike’s slides aren’t available just now (some great social network charts in there), but if they do become available I’ll let you know.
Information Management: Predictive Analytics World – Take 2