Cohort analysis with R – “layer-cake graph” (part 2)

[This article was first published on Analyze Core » R language, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Continue to exploit great idea of ‘layer-cake’ graph.

If you like approach I shared in previous topic, perhaps you have one or two questions we should answer. Recall “Total revenue by Cohort” chart :


As total revenue depends on number of customers we attracted and on amount of money each of them spent with us, it has sense to dig deeply.

The number of active customers can be visualized via algorithm we used for total revenue. Again, after we processed a large amount of data it should be on following structure. There are Cohort01, Cohort02, etc. – cohort’s name due to customer signup date or first purchase date and M1, M2, etc. – period of cohort’s life-time (first month, second month, etc.):


For example, Cohort-1 was signed up in January (M1) and included 11,000 clients who made purchases during the first month (M1). Cohort-5 was signed up in May (M5) and there were 1,100 active clients in September (M9).

Ok. Suppose you’ve done data process and got cohort.clients data frame as a result and it looks like the table above. You can replicate this data with the following code:

cohort.clients <- data.frame(cohort=c('Cohort01', 'Cohort02', 'Cohort03', 'Cohort04', 'Cohort05', 'Cohort06', 'Cohort07', 'Cohort08', 'Cohort09', 'Cohort10', 'Cohort11', 'Cohort12'),

Let’s create the “layer-cake” chart with the following R code:

#connect necessary libraries
#we need to melt data <- melt(cohort.clients, id.vars = 'cohort')
colnames( <- c('cohort', 'month', 'clients')

#define palette
reds <- colorRampPalette(c('pink', 'dark red'))

#plot data
p <- ggplot(, aes(x=month, y=clients, group=cohort))
p + geom_area(aes(fill = cohort)) +
 scale_fill_manual(values = reds(nrow(cohort.clients))) +
 ggtitle('Active clients by Cohort')

And we take the second amazing chart:


It seems like a lot of customers purchased once and gone. It can be a reason why total revenue is mainly provided by new customers.

And finally we can calculate and visualize the average revenue per client. The R code can be as the following:

#we need to divide the data frames (excluding cohort name)
rev.per.client <- cohort.sum[,c(2:13)]/cohort.clients[,c(2:13)]
rev.per.client[] <- 0
rev.per.client <- cbind(cohort.sum[,1], rev.per.client)

#define palette
greens <- colorRampPalette(c('light green', 'dark green'))

#melt and plot data <- melt(rev.per.client, id.vars = 'cohort.sum[, 1]')
colnames( <- c('cohort', 'month', 'average_revenue')
p <- ggplot(, aes(x=month, y=average_revenue, group=cohort))
p + geom_area(aes(fill = cohort)) +
 scale_fill_manual(values = greens(nrow(cohort.clients))) +
 ggtitle('Average revenue per client by Cohort')

And we take the third chart:


It seems like Cohort02 customers increased their average purchases during M5-M8 months. It can be a sign.

Note: The last chart shows average revenue per customer of each cohort, but it isn’t cumulative value as in previous two charts, it doesn’t show total average revenue for all clients. It seems like total average revenue per customer in M12 is about $500, but it isn’t. This chart should be used for comparing cohorts, not for summarizing. Please, don’t be confused.

Have questions? Don’t hesitate!

To leave a comment for the author, please follow the link and comment on their blog: Analyze Core » R language. offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)