Using survival models for marketing attribution

[This article was first published on Revolutions, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

by Andrie de Vries

Prior to joining Revolution Analytics in March this year, I spent several years in the field of market research and survey analytics. During this period, I spent a few months consulting to a digital marketing agency based in London. My role was to help build their capability in building customer surveys and integrating these into social media sites, including Facebook and Twitter.

As a media agency, a substantial part of the business activity was to place client advertisements on websites, including social media. The worldwide online advertising market is estimated at $100B per year. This is a fast moving, competitive and innovative industry, where tiny improvements in the clickthrough rate (CTR) and conversion rate can lead to large improvements in the effectiveness of campaigns.

Given the large amounts of online advertising spend, and the competitive nature of the industry, it is increasingly important to make sure each advertising dollar is spent in the right way. In the marketing world, this question of where to spend money is often referred to as attribution, since the question is how to attribute value. In other words, how much did each interaction with an advertisement contribute to sales conversion?

But for many media agencies it still is a challenge to attribute value, and as a result it's not uncommon for companies to use the easiest of all models, where all value is attributed to the last click.

There has to be a better way, and when I was asked to build a statistical model for attribution, my immediate thought was to make use of logistic regression. My thought was that one should be able to isolate a clickstream (or impression stream) for people who did convert (i.e. click through the final ad) vs. those who didn't.

It turns out this is not a trivial problem. In my case, I only had information about the converters, and none about the non-converters. So a logistic regression was not appropriate. This baffled me, until I saw a reference to how Datasong (previously Upstream) used survival models for marketing attribution. This made me search for more information on the topic, and I came across a PhD thesis by John Chandler-Pepelnjak, published in 2010. His thesis investigated Modelling conversions in online advertising.

With my interest piqued, I investigated how to use survival analysis for marketing attribution.

During the last week, I presented (summarised and anonymized) results of my work at three occasions:

It was very satisfiying to share my ideas with people, and this lead to many new ideas and questions. In a follow-up blog post I plan to highlight some questions, insights and novel applications of survival analysis outside the traditional fields of medicine and engineering.

To leave a comment for the author, please follow the link and comment on their blog: Revolutions. offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)