KDD Cup 2015 winners announced

July 20, 2015

(This article was first published on Revolutions, and kindly contributed to R-bloggers)

The KDD Cup is an annual competition to build the best predictive model from a large data set. This years' contest tasked entrants to predict the likelihood of a student dropping out from one of XuetangX's massively-online open courses, based on the student's prior activities. The competition closed on July 12, and yesterday, the winning teams were announced. The winner was team "Intercontinental Ensemble" and the runner-up was "[email protected]".

I couldn't find any details on what techniques were used — more will be revealed, I expect, at the KDD Conference in Sydney. But if you want to get a sense of what it's like to work with these data, take a look at this Data Until I Die blog post from a competitor who got close to the top of the leaderboard. He or she used a Gradient Boosting Model from the H20 R package, and found (amongst other things) that students who had completed prior courses were more likely to complete the next one.


If you'd like to play around with the data yourself, it's no longer available at the KDD Cup site, but it is available in an experiment in Azure ML Studio. If you haven't used Azure ML Studio before, it's free to get started and all you need is a modern web broswer (I used Chrome on a Mac). The screenshot below just shows the data munging steps, but later on in the flow a Python node is used to fit a predictive model. (This step-by-step tutorial on analyzing the KDD 2015 data walks you through the steps.) It's easy to add an R node as well, which gives you an R instance with 50 Gb of RAM and 8 cores to analyze the data.


For more details on using Azure ML Studio to analyze the KDD Cup data, check out the blog post below.

Technet: Solving the KDD Cup 2015 Challenge Using Azure ML

To leave a comment for the author, please follow the link and comment on their blog: Revolutions.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.


Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)