Predict which shoppers will become repeat buyers

[This article was first published on Revolutions, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

by James P. Peruvankal

Kaggle just announced a competition to predict which shoppers will become repeat buyers. To aid with algorithmic development, they have provided complete, basket-level, pre-offer shopping history for a large set of shoppers who were targeted for an acquisition campaign. Files containing the incentives offered to each shopper as well as their post-incentive behavior are also provided.

This challenge provides almost 350 million rows of completely anonymised transactional data from over 300,000 shoppers. It is one of the largest problems run on Kaggle to date. Once unzipped, data size will be 22GB, more than what can fit into the memory of usual laptops.

If you like this sort of thing, a first look at the data ought to captivate your interest. The following plots shows the number of repeated trips to the store plotted against the offer value in dollars on the x axis. The data are shaded by market, a geographical area.


To get your own first look at the data, and maybe try out a few of the fast Parallel External Memory Algorithms included in Revolution R Enterprise, you might find it helpful to take advantage of Revolution Analytics offer to try out Revolution R Enterprise in the AWS cloud. (If you spin up a Linux box in AWS, you can go up to 64GB RAM.)

This contest is representative of the challenge coping with the exponential growth in real-world data projects. I am sure, we will see more of these kind of problems.

In addition to trying Revolution R Enterprise in the cloud, active Kaggle competitors can download the full-featured Revolution R Enterprise software and use it for free to create their own submissions.

Some of us Revolutionaries are jumping into the fray. See you at the competition!

To leave a comment for the author, please follow the link and comment on their blog: Revolutions. offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)