Increasing Repeat Purchase Rate by Analyzing Customer Latency

August 28, 2013

(This article was first published on Data Apple » R Blogs in English, and kindly contributed to R-bloggers)

For online businesses, Repeat Purchase Rate is one of the critical metrics of the business performance. Higher repeat purchase rate means more active members, and thus leads to higher profit.

Customer Latency refers to the average time between customer activity events, for example, making a purchase, calling the help desk, or visiting a web site”1, said Jim Novo.

In this article, we will demonstrate how to find out the right trigger points of marketing campaigns by analyzing Customer Latency, thus to increase the Repeat Purchase Rate.

Exploring and Preparing the CDNOW Data Set

We will use the CDNOW Sample data during this demonstration; you can download the data here.

There are 2357 unique customers who made their first-ever purchase at CDNOW in the first quarter of 1997 in the sample data. There are total 6919 transaction records, which occurred during the period of the start of Jan 1997 to the end of June 1998.

For more details about the dataset, read the paper of “Creating an RFM Summary Using Excel (Peter S. Fader, Bruce G. S. Hardie)” please.

We will keep the columns of “ID”, “Date”, and “Amount” in the original data set and prepare several  additional columns of “Interval”, “Times”, and “TotalTimes”, thus we can manipulate the data set more conveniently, where,

“ID” is the customer ID;

“Date” is the transaction date;

“Amount” is the money amount paid by a customer per transaction;

“Interval” is the Customer Latency, the number of days between a customer’s each transactions, say 10 days between 1st purchase and the second, 15 days between the second and the third, and so on;

“Times”  is 1 to n,   1 means the customers’ first purchase, 2 means the second, and so on;  

TotalTimes is the number of transactions in total for a customer.

The prepared data set  looks like the following.

> head(df)

ID       Date           Amount Interval Times TotalTimes

1    4       1997-01-01  29.33        0           1          4

2    4      1997-01-18  29.73       17          2          4

3    4      1997-08-02  14.96      196         3          4

4    4      1997-12-12  26.48      132         4          4

158 18  1997-01-04  14.96        0           1          1

5   21    1997-01-01  63.34        0           1          2

Calculating the Repeat Purchase Rate and Percentages

First of all, let’s examine the average number of repeat purchases per customer and the average spending amount per transaction during that period.

> 6919/2357

[1] 2.935511

> sum(df$Amount)/6919

[1] 35.2785

The average repeat purchases rate is about 3. Obviously it is not a high rate for an online store of CDs.  Let’s further study the distribution of total repeat times of the customers.

# get the matrix of customer ID ~ the customer’s total number of transactions

> TimesByID <$ID))

#get the matrix of total number of transactions ~ number of customers who have the total number

> GroupByTimes <-$Freq))

> head(GroupByTimes,12)

Times   Customers

1      1      1205

2      2       406

3      3       208

4      4       150

5      5        98

6      6        56

7      7        65

8      8        35

9      9        23

10    10        21

11    11         8

12    12        10

>plot(GroupByTimes,xlab=”Total Number of Purchases”,ylab=”Number of Customers”,pch=16,col=”blue”,type=”o”)

> text(2,1220,”1205″)

> text(3,425,”406″)

> text(4,220,”208″)

> text(5,170,”150″)

> text(6,120,”98″)

> text(12,50,”10″)

> text(30,50,”1″)

CustomersByTimes                                                                              Figure – 1

As we can see from Figure – 1 above, the number of customers decreases very quickly while the total number of purchases increases from 1 to 6.  Almost of half of the customers only made one purchase during the 1.5 year period!

Let’s examine the percentage of customers making (x) purchases more closely.

> percentages<-round(GroupByTimes$Customers / 2357 , 3)

> percentages

[1] 0.511 0.172 0.088 0.064 0.042 0.024 0.028 0.015 0.010 0.009

[11] 0.003 0.004 0.006 0.003 0.003 0.003 0.004 0.001 0.001 0.000

[21] 0.001 0.002 0.002 0.000 0.000 0.000 0.000 0.000 0.000 0.000

[31] 0.000 0.000 0.001 0.000 0.000

> x<-barplot(percentages [1:10]*100,col=”blue”,main=”Percentage of Customers Making (x) Purchases”, xlab=”Number of Purchases”, ylab=”Repeat Purchase Rate (%)”,ylim=range(0:55),axisnames=TRUE,names.arg=GroupByTimes$Times[1:10],cex.names=TRUE)

> text(x, percentages [1:10]*100+2,paste(percentages [1:10]*100,”%”))

repeatRate                                                                          Figure – 2

As shown in Figure-2 which displays the percentages of customers who made 1 to 10 purchases respectively, 51.1% of customers only made one purchase, 17.2% of the customers made two purchases, and 8.8% of the customers made three purchases and so on.

Based on the above data, to increase the average repeat purchase rate, CDNOW should try to increase the percentage of customers who make more than one purchase, especially the customers who make two and three purchases because the percentage decreases very quickly from one purchase to two purchases, and from two purchases to three purchases. We will leverage the Customer Latency concept to find ways to increase the repeat purchase rate in the following parts.

Calculating the Customer Latency and Increasing the Repeat Purchase Rate

Here Customer Latency refers to the average time between customers’ purchases. For example the average days between customers’ first purchase and second purchase, the days between second purchase and the third purchase, and so on.

Let’s calculate the Customer Interval between 1st and 2nd purchase first since increasing the 2nd purchase rate is important for increasing the overall repeat purchase rate.

> # filter out the customers who only made more than one purchase and their intervals between the 1st and the 2nd purchase

> df2<-df[df$TotalTimes>=2 & Times==2,]

> # see how many 2nd transcations

> nrow(df2)

[1] 1152

> # get the mean days of customer latency

> mean(df2$Interval)

[1] 105.6276

There are total 1152 second purchases and the average customer latency is about 100 days.

Let take a further look at the distributions of the Customer Latency.

> hist(df2$Interval,main=”Distribution of Customer Latency (1st – 2nd purchase)”, xlab=”Days”, ylab=”Number of 2nd Transcations”) latency_1_2                                                                           Figure – 3

As shown in Figure-3, more than half of the second purchases happened in 50 days after the first purchase and it is a decline distribution from left to right.

A customer who has longer Latency than the average Latency of the norm means something happened. It might be due to that the customer was unhappy with the product or service, or it might be due to his own reasons. Anyway, the Latency data is speaking to us, “it is a rising of the hand by the customer, and the Data-Driven marketer or service provider not only sees the raised hand, but also reacts to it3, as Jim Novo mentioned in his book.

So, based on the above analysis, CDNOW should do something to increase the second purchase rate when the customers’ Latency in the database exceeds 50 days, and 100 days. It can be an email sent to the customers with coupon or discount or something else to absorb the customers to go back to the CDNOW again. Otherwise, the longer the Latency is, the more likely the customers will defect.

By the same way, we can also calculate the average Latency between the second and the third purchase for increasing the third repeat purchase rate.

Thus the overall average repeat purchase rate will likely be increased.

R Source Codes

You can download the complete R source codes here.


  3. Drilling Down – Turning Customer Data into Profits with a Spreadsheet, Jim Novo, 2004

Author: Jack Han. All rights reserved. 转载须以超链接形式标明文章原始出处和作者信息

9,258 total views, 38 views today

To leave a comment for the author, please follow the link and comment on their blog: Data Apple » R Blogs in English. offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.


Mango solutions

RStudio homepage

Zero Inflated Models and Generalized Linear Mixed Models with R

Quantide: statistical consulting and training


CRC R books series

Contact us if you wish to help support R-bloggers, and place your banner here.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)