Calculating Customer Lifetime Value with Recency, Frequency, and Monetary (RFM)

[This article was first published on Data Apple » R Blogs in English, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Introducing Customer Lifetime Value (CLV)

Customer Lifetime Value is “the present value of the future cash flows attributed to the customer during his/her entire relationship with the company.”1 There are different kinds of formulas, from simplified to advanced, to calculate CLV.  But the following one might be the one being used most commonly:-

Where,

t is a period, e.g. the first year(t=1), the second year(t=2)

n is the total number of periods the customer will stay before he/she finally churns

r is the retention rate/possibility

Pt is the profit the customer will contribute in the Period t

d is the discount rate

Here we assume that r is constant in the above formula; however, it is not always the case. The factors which influence r include demographics (age, geography, and profession etc), behavior (Recency, Frequency, Monetary, etc), tenure, competition, etc2. There are some improved formulas which forecast the r by different approaches such as Logistic Regression.

In the article, we will demonstrate how to calculate a customer’s CLV by predicting the retention/repurchasing rate r of customers in each future purchasing cycle time with the Logistic Regression model based on the predictors of Recency, Frequency, and Monetary.

We will use the CDNow full data set for concrete case study to build the above model.

The CDNow data set can be downloaded here. There are 23570 unique customers who made their first ever purchase at CDNOW in the first quarter of 1997 in the sample data. There are total 69659 transaction records, which occurred during the period of the start of Jan 1997 to the end of June 1998.

For more details about the dataset, read the paper of “Creating an RFM Summary Using Excel (Peter S. Fader, Bruce G. S. Hardie)” please or another blog RFM Customer Analysis with R Language on this website.

Exploring the relationships between Repurchase Rate and Recency, Frequency, and Monetary

Firstly we calculate the number of customers grouped by Recency values, and then further group them into “Buy” and “No Buy” according to the data in the next purchasing cycle time, and finally get the percentage of customers who repurchase in a certain Recency value in the next period. Here we leverage the R language function “ddply” to complete the grouping and calculating work. Below is a list pairs of percentage and Recency value we calculated. Please note that the less the Recency value is, the more recent the purchasing takes place.

Recency Buy Number Percentage

0              1   1180       0.45

1              1    581       0.28

2              1    279       0.22

3              1    198       0.17

4              1    163       0.14

5              1    249       0.05

6              1    316       0.03

7              1     13       0.03

The first row means that there are 45% customers who purchased CDs in the most recent period (Recency=0), purchased CDs again in the next period. We selected the translations that took place Jan 1st, 1997 through Feb 28th, 1998, for the calculating. The duration of the purchasing cycle time is set as two months.

By the same way, we can get the percentage lists of Frequency and Monetary. The relationships between Repurchase Rate and Recency, Frequency, and Monetary are plot blow.

percentage_curves

The scatter plots above suggest that there is an obvious linear or exponential fall relationship between the repurchasing percentage and the Recency, and an obvious exponential rise relationship between the repurchasing percentage and the Frequency. However, there is no obvious relationship between the repurchasing percentage and the Monetary.

Building the model

Based on the above observation, we only use Recency and Frequency as the predictors in this case and conduct the logistic regression to get the model with R language.

>model=glm(Buy~Recency+Frequency,family=quasibinomial(link=’logit’),data=train)

Given a customer’s status of Recency and Frequency, we can predict the probability of repurchasing with the above model.

> pred<-predict(model,data.frame(Recency=c(0),Frequency=c(1)),type=’response’)

> pred

1

0.2579282

As shown in the above, a customer, say Tom, who became a new customer in the most recent period (So Recency = 0, and Frequency=1), has a 26% probability to purchase again in the next period (Period 1).

Calculating CLV

Suppose Tom would remain for 3 more periods before he churns, and the average profit he would contribute are 100 dollars, the discount rate is 0.02. How to calculate Tom’s CLV?

clv_tree

As shown in the above figure, The rectangles in light blue color are Tom’s possible Recency and Frequency status in each period.

In Period 0, his Recency is 0 and his Frequency is 1.

In Period 1, there are 0.26 probabilities, which we have calculated by the model in the above part, for him to buy again, and 0.74 probabilities for him Not to buy again. In the first case, his status would transit to Recency=0 and Frequency=2; in the second case, his status would transit to Recency=1 and Frequency=1.  The forecast profit Tom would contribute in Period 1 is 0.26 * 100 / (1+0.02) = 25.5 dollars.

In Period 2, Tom would transit to four possible statues. Take the most left statues for illustration, we first get the probabilities of transition by the model with the input value of status of R=0 and F=2 in Period 1.

> pred<-predict(model,data.frame(Recency=c(0),Frequency=c(1)),type=’response’)

> pred

1

0.2873471  (about 29%)

Then the probabilities for Tom to purchase again in Period 2 after purchasing in Period 1 are 0.26 * 0.29 = 0.08.  We can also get the probabilities for Tom not to purchase in Period 1 but to purchase in Period 2 is 0.14. The forecast profit Tom would contribute in Period 2 is (0.08 + 0.14) * 100 / (1+0.02)2 = 21.1 dollars.

In Period 3, by the same way, the forecast profit is (0.03 + 0.04 + 0.04 + 0.08) * 100 / (1+0.02)3=17.9 dollars

Tom’s CLV is 64.5 dollars by summing up the forecast profit in Period 1, 2, and 3.

R Source Code

The R source code we used in this article can be downloaded here.

References

1. http://en.wikipedia.org/wiki/Customer_lifetime_value

2. Customer Lifetime Value (CLV) – A Methodology for Quantifying and Managing Future Cash Flows, David C. Ogden

3. Chapter 5 The Migration Model , Segmentation and Lifetime Value Models Using SAS, Edward C. Malthouse

Author: Jack Han. All rights reserved. 转载须以超链接形式标明文章原始出处和作者信息

992 total views, 19 views today

To leave a comment for the author, please follow the link and comment on their blog: Data Apple » R Blogs in English.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)