**R-english – Freakonometrics**, and kindly contributed to R-bloggers)

In order to illustrate hierarchical clustering techniques and *k*-means, I did borrow François Husson‘s dataset, with monthly average temperature in several French cities.

> temp=read.table( + "http://freakonometrics.free.fr/FR_temp.txt", + header=TRUE,dec=",")

We have 15 cities, with monthly observations

> X=temp[,1:12] > boxplot(X)

Since the variance seems to be rather stable, we will not ‘normalize’ the variables here,

> apply(X,2,sd) Janv Fevr Mars Avri 2.007296 1.868409 1.529083 1.414820 Mai Juin juil Aout 1.504596 1.793507 2.128939 2.011988 Sept Octo Nove Dece 1.848114 1.829988 1.803753 1.958449

In order to get a hierarchical cluster analysis, use for instance

> h <- hclust(dist(X), method = "ward") > plot(h, labels = rownames(X), sub = "")

An alternative is to use

> library(FactoMineR) > h2=HCPC(X) > plot(h2)

Here, we visualise observations with a principal components analysis. We have here also an automatic selection of the number of classes, here 3. We can get the description of the groups using

> h2$desc.ind

or directly

> cah=hclust(dist(X)) > groups.3 <- cutree(cah,3)

We can also visualise those classes by ourselves,

> acp=PCA(X,scale.unit=FALSE) > plot(acp$ind$coord[,1:2],col="white") > text(acp$ind$coord[,1],acp$ind$coord[,2], + rownames(acp$ind$coord),col=groups.3)

It is possible to plot the centroïds of those clusters

> PT=aggregate(acp$ind$coord,list(groups.3),mean) > points(PT$Dim.1,PT$Dim.2,pch=19)

If we add Voroid sets around those centroïds, here we do not see them (actually, we see the point – in the middle – that is exactly at the intersection of the three regions),

> library(tripack) > V <- voronoi.mosaic(PT$Dim.1,PT$Dim.2) > plot(V,add=TRUE)

To visualize those regions, use

> p=function(x,y){ + which.min((PT$Dim.1-x)^2+(PT$Dim.2-y)^2) + } > vx=seq(-10,12,length=251) > vy=seq(-6,8,length=251) > z=outer(vx,vy,Vectorize(p)) > image(vx,vy,z,col=c(rgb(1,0,0,.2), + rgb(0,1,0,.2),rgb(0,0,1,.2))) > CL=c("red","black","blue") > text(acp$ind$coord[,1],acp$ind$coord[,2], + rownames(acp$ind$coord),col=CL[groups.3])

Actually, those three groups (and those three regions) are also the ones we obtain using a *k*-mean algorithm,

> km=kmeans(acp$ind$coord[,1:2],3) > km K-means clustering with 3 clusters of sizes 3, 7, 5

(etc). But actually, since again we have some spatial data, it is possible to visualize them on a map

> library(maps) > map("france") > points(temp$Long,temp$Lati,col=groups.3,pch=19)

or, to visualize the regions, use e.g.

> library(car) > for(i in 1:3) + dataEllipse(temp$Long[groups.3==i], + temp$Lati[groups.3==i], levels=.7,add=TRUE, + col=i+1,fill=TRUE)

Those three regions actually make sense, geographically speaking.

**leave a comment**for the author, please follow the link and comment on their blog:

**R-english – Freakonometrics**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...