Eigenimages: The AT&T Cambridge Faces Database

[This article was first published on BioStatMatt » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I picked up the AT&T Laboratories Cambridge database of faces for a clustering application. The database consists of images of 40 distinct subjects, each in 10 different facial positions and expressions. Typically, the goal of clustering in these data is to recover the ‘true’ partition, or that which isolates images of distinct subjects. Each image is is 92 x 112 pixels in dimension, taking black-and-white integer values in the 8-bit range (0 to 255). Such high-dimensional images (92 x 112 = 10304) are difficult to work with directly. We can look to data-squashing to help here. (Actually, I’m not sure the term ‘data-squashing’ was intended for methods like PCA, but it seems appropriate to me.)

I used principal components analysis to identify a set of rotated pixels that were highly variable, and presumably most useful for discriminating between the images, resulting in this interesting image montage. The first 20 eigneimages (in reading order) each represent the rotation of a 92 x 112 black-and-white image onto a single pixel. Darker regions in the eigenimages load higher in the rotation. Consequently, darker regions are important for discriminating between images in the dataset. The dark pixels in the top-left image account for about 18% of the variability in the entire dataset. In other words, these regions of the face may be the most useful for facial recognition.

I’ve put together an archive of the images, a function to read the PGM image pixels into R, do the PCA, and recreate the graphic above, in less than 60 lines (though I shouldn’t boast, else someone will cut it to 20 lines and shame me). You can download the archive here ATTfaces.tar.gz (please be patient, ~3.7MB). From a shell prompt, recreate the graphic as follows:

$ tar -xvzf ATTfaces.tar.gz
$ R -q
> source("ATTfaces.R")
> pcaPlot()

Disclaimer: This image is a re-posting from my old website. However, the code and discussion were not given before.

To leave a comment for the author, please follow the link and comment on their blog: BioStatMatt » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)