Peter Aldhous and Jim Giles — from New Scientist's San Francisco bureau — are looking for a statistician and R user to take part in an interesting data analysis challenge, and also be part of a future article in the magazine. They were inspired by this rather tongue-in-cheek presentation where Sebastian Wernicke analyzed videos, transcripts and ratings of TED talks to conclude, for example, that a talk about how “French coffee spreads happiness in your brain” would be the “ultimate TED talk”.
As the hook for an upcoming article about analytics and predictions, New Scientist will soon be running a competition along similar lines: can you use information extracted from the covers of New Scientist magazines (headline, subheadline, photograph, main colors, article titles, etc.) to predict newsstand sales? Competitors will be given historical sales data to build models, and can use the extracted cover details (or extract their own details from provided cover images), plus any other publicly-accessible data (weather records for examples), to help boost their predictions. Teams will be ranked on a weekly basis on their ability to generate the best predictions for future newsstand sales in the leadup to the published article.
Several teams with expertise in domains ranging from machine learning to sentiment analysis and neural networks — and even a pair of trained pigeons! — have been assembled already, but as yet no-one's using the power of R for the analysis. This is a problem that R is ideally suited for, so if you'd like to give it a shot, and can spend some time over the next few months building novel models that can outperform the other teams, New Scientist would like to hear from you. I've offered to help Peter and Jim find a candidate (or team) for the competition, so send me an email if you'd like to participate with some notes about how you'd tackle the problem. C'mon, you just know you can beat those pigeons.
Incidentally, New Scientist is no stranger to R: previous articles have featured analyses or graphics done in the R environment. A few examples appear after the jump; most of the links require a New Scientist subscription to view.
In the article “Inside the Stem Cell Wars“, Cox proportional hazards regression in R was used to examine differences in time-to-acceptance and time-to-publication between papers from US and Non-US based scientists in the hottest area of stem cell research. The Kaplan-Meier curve in the article (below) was created in R and enhanced with Adobe Illustrator.
Another example comes from the article “Hey green spender: The truth about eco-friendly brands“. The interactive infographic in the screenshot below was based on scatterplots in R. R was also for Kruskal-Wallis tests and subsequent multiple comparisons, and for Spearman rank correlations: see the exact methods used.