Cohort analysis with R – “layer-cake graph”

(This article was first published on Analyze Core » R language, and kindly contributed to R-bloggers)

Cohort analysis is one of the most powerful and demanded technique available to marketers for assessing long-term trends in customer retention and calculating life-time value.

If you studied custora’s university, you could be interested by amazing “layer-cake graph” they propose for Cohort analysis.

cohort_graph_B4Custora says: “The distinctive “layer-cake graph” produced by looking at cohorts in calendar time can provide powerful insights into the health of your business. At a given point in time, what percentage of your revenue or profit came from new vs. repeat customers? Tracking how that ratio has changed over time can give you insight into whether you’re fueling top-line growth solely through new customer acquisition – or whether you’re continuing to nurture those relationships with your existing customers over time.”

Usually we focus on calculating life-time value or comparing cohorts, but I was really impressed with this useful analytical approach and tried to do such chart in R. Now, we can see what I’ve got.

After we processed a great deal of data it should be on following structure. There are Cohort01, Cohort02, etc. – cohort’s name due to customer signup date or first purchase date and M1, M2, etc. – period of cohort’s life-time (first month, second month, etc.):


For example, Cohort-1 was signed up in January (M1) and brought us $270,000 during the first month (M1). Cohort-5 was signed up in May (M5) and brought us $31,000 in September (M9).

Ok. Suppose you’ve done data process and got cohort.sum data frame as a result and it looks like the table above. You can replicate this data with the following code:

cohort.sum <- data.frame(cohort=c('Cohort01', 'Cohort02', 'Cohort03', 'Cohort04', 'Cohort05', 'Cohort06', 'Cohort07', 'Cohort08', 'Cohort09', 'Cohort10', 'Cohort11', 'Cohort12'),

Let’s create the “layer-cake” chart with the following R code:

#connect necessary libraries
#we need to melt data
cohort.chart <- melt(cohort.sum, id.vars = "cohort")
colnames(cohort.chart) <- c('cohort', 'month', 'revenue')

#define palette
blues <- colorRampPalette(c('lightblue', 'darkblue'))

#plot data
p <- ggplot(cohort.chart, aes(x=month, y=revenue, group=cohort))
p + geom_area(aes(fill = cohort)) +
 scale_fill_manual(values = blues(nrow(cohort.sum))) +
 ggtitle('Total revenue by Cohort')

And we take such amazing chart:


It seems like there was some promo in eighth month (M8) and a few cohorts responded. Really useful graph.

Although the R code looks pretty simple, I spent most of the time for aggregating data. I can’t propose universal R code for this task, as structure of your initial data can be completely different.

Have questions? You are welcome!

To leave a comment for the author, please follow the link and comment on their blog: Analyze Core » R language. offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)