Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

In order to illustrate hierarchical clustering techniques and k-means, I did borrow François Husson‘s dataset, with monthly average temperature in several French cities.

> temp=read.table(
+ "http://freakonometrics.free.fr/FR_temp.txt",
+ header=TRUE,dec=",")

We have 15 cities, with monthly observations

> X=temp[,1:12]
> boxplot(X)

Since the variance seems to be rather stable, we will not ‘normalize’ the variables here,

> apply(X,2,sd)
Janv     Fevr     Mars     Avri
2.007296 1.868409 1.529083 1.414820
Mai     Juin     juil     Aout
1.504596 1.793507 2.128939 2.011988
Sept     Octo     Nove     Dece
1.848114 1.829988 1.803753 1.958449

In order to get a hierarchical cluster analysis, use for instance

> h <- hclust(dist(X), method = "ward")
> plot(h, labels = rownames(X), sub = "")

An alternative is to use

> library(FactoMineR)
> h2=HCPC(X)
> plot(h2)

Here, we visualise observations with a principal components analysis. We have here also an automatic selection of the number of classes, here 3. We can get the description of the groups using

> h2$desc.ind or directly > cah=hclust(dist(X)) > groups.3 <- cutree(cah,3) We can also visualise those classes by ourselves, > acp=PCA(X,scale.unit=FALSE) > plot(acp$ind$coord[,1:2],col="white") > text(acp$ind$coord[,1],acp$ind$coord[,2], + rownames(acp$ind$coord),col=groups.3) It is possible to plot the centroïds of those clusters > PT=aggregate(acp$ind$coord,list(groups.3),mean) > points(PT$Dim.1,PT$Dim.2,pch=19) If we add Voroid sets around those centroïds, here we do not see them (actually, we see the point – in the middle – that is exactly at the intersection of the three regions), > library(tripack) > V <- voronoi.mosaic(PT$Dim.1,PT$Dim.2) > plot(V,add=TRUE) To visualize those regions, use > p=function(x,y){ + which.min((PT$Dim.1-x)^2+(PT$Dim.2-y)^2) + } > vx=seq(-10,12,length=251) > vy=seq(-6,8,length=251) > z=outer(vx,vy,Vectorize(p)) > image(vx,vy,z,col=c(rgb(1,0,0,.2), + rgb(0,1,0,.2),rgb(0,0,1,.2))) > CL=c("red","black","blue") > text(acp$ind$coord[,1],acp$ind$coord[,2], + rownames(acp$ind$coord),col=CL[groups.3]) Actually, those three groups (and those three regions) are also the ones we obtain using a k-mean algorithm, > km=kmeans(acp$ind$coord[,1:2],3) > km K-means clustering with 3 clusters of sizes 3, 7, 5 (etc). But actually, since again we have some spatial data, it is possible to visualize them on a map > library(maps) > map("france") > points(temp$Long,temp$Lati,col=groups.3,pch=19) or, to visualize the regions, use e.g. > library(car) > for(i in 1:3) + dataEllipse(temp$Long[groups.3==i],
+ col=i+1,fill=TRUE)