Clustering is a very common data mining task and has a wide variety of applications from customer segmentation to grouping of text documents. K-means clustering was one of the examples I used on my blog post introducing R integration back in Tableau 8.1. Many others in Tableau community wrote similar articles explaining how different clustering techniques can be used in Tableau via R integration.
One thing I didn’t see getting much attention was time series clustering and using hierarchical clustering algorithms. So I thought it might be good to cover both in single post.
Let’s say you have multiple time series curves (stock prices, social media activity, temperature readings…) and want to group similar curves together. Series don’t necessarily align perfectly, in fact they could be even about events that are happening at different pace.
There are of course many algorithms to achieve this task but R conveniently offers a package for Dynamic Time Warping. Below is what my calculated field in Tableau looks like.
I started by loading the dtw package, then converted my data from long table format (all time series are in the same column) to wide table format (each series is a separate column).
I then computed distance matrix using dtw as my method and applied hierarchical clustering with average linkage. This gives a tree which I then prune to get the desired number of clusters.
And finally, data is converted back into a long table before being pulled back into Tableau.
Screenshot is from Tableau 10, but don’t worry if you’re not on the Beta program. You can download Tableau 8.1 version of the workbook from HERE. Enjoy!