Clustering Frankenstein

Posted on August 5, 2019 by @aschinchon in R bloggers | 0 Comments

[This article was first published on R – Fronkonstin, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Necesito para estar sentado, un arbolito en este descampado (Desarraigo, Extremoduro)

From time to time I come back to experiment with this stunning photograph of Boris Karloff as Frankenstein’s monster. I have done several of them previously: from decomposing it into Voronoi regions, to draw it as a single line portrait using an algorithm to solve the travelling salesman problem. I also used this last technique to do a pencil portrait of the image. Today I will use a machine learning algorithm to reinterpret the monster once again. Concretely, I will use hierarchical clustering to do drawings like this one:

The idea is simple: once loaded the photograph, the first step is to binarize it into a black and white image using thresold function of imager package. After that, a random sample of black points is taken. Here comes the clustering algorithm, which starts measuring the euclidean distance between each pair of points. Then, a hierarchical clustering is done so I can reproduce how points are gathered walking through the resulting dendrogram of the previous clustering, starting from the maximum number of clusters (each cluster is an individual point) and ending with the minimum one (just one cluster with the whole sample). The next image shows and example of this process for a sample of 25 points. The left plot shows the population of points and the right one the way that points are connected once the dendrogram is analyzed following the steps described before:

Applying this technique to a big amount of points (between 2.000 and 5.000) result in very interesting drawings. To make process faster, I used map function from purrr package. To render the graph I use ggplot function with geom_curve. This geometry draws a curve between two points, named (x, y) and (xend, yend) respectively. Among others, there are two important parameters to control its shape: curvature (negative values produce left-hand curves, positive values produce right-hand curves, and zero produces a straight line) and angle (values less than 90 skew the curve towards the start point and values greater than 90 skew the curve towards the end point). Playing with this paramaters, as well as with the sample size, you can generate a wide variety of drawings (note that here only appear the segments, since now I removed the points of their extremes):