Visualising cluster stability using Sankey diagrams

Simon Raper

8 years ago

[This article was first published on Coppelia » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I wanted a way of understanding how a clustering solution will change as more data points are added to the dataset on which it is built. To explain this a bit more, let’s say you’ve built a segmentation on customers, or products, or tweets (something that is likely to increase) using one or other clustering solution, say hierarchical clustering. Sooner or later you’ll want to rebuild this segmentation to incorporate the new data and it would be nice to know how much the segmentation will change as a result. One way of assessing this would be to take the data you have now, roll it back to a previous point in time and then add new chunks of data sequentially each time rebuilding the clustering solution and comparing it to the one before.

To leave a comment for the author, please follow the link and comment on their blog: Coppelia » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.