Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Nonnegative matrix factorization (NMF) is a popular tool in many applications, such as image and text recognition. If you’ve ever wanted to learn a little bit about NMF, you can do so right here, in this blog post, which will summarize the (slightly) longer presentation here. The R package NMF will be used as illustration.

Given a u x v matrix A with nonnegative elements, we wish to find nonnegative, rank-k matrices W (u x k) and H (k x v) such that

A ≈ WH

In other words, NMF is a form of dimension reduction.

Note that this means that column j of A is approximately a linear combination of the columns of W, with the coefficients being column j of H. W thus forms what I call a pseudobasis for the columns of A.

The larger the rank k, the better our approximation . But we typically hope that a good approximation can be achieved with k << v.

The matrices W and H are calculated iteratively, with one of the major methods being linear regression. Here is how:

We make initial guesses for W and H, say with random numbers. Now consider an odd-numbered iteration. Suppose just for a moment that we know the exact value of W, with H unknown. Then for each j we could “predict” column j of A from the columns of W. The coefficient vector returned by lm() will become column j of H. We do this for j = 1,2,…,v.

In even-numbered iterations, suppose we know H but not W. We could take transposes,

A’ ≈ H’ W’

and predict row i of A from the rows of H. Then we alternate until we reach convergence.

CRAN’s NMF package for NMF computation is quite versatile, with many, many options. In its simplest form, though, it is quite easy to use. For a matrix a and desired rank k, we simply run

 > nout <- nmf(a,k)


The factors are then in [email protected]@W and [email protected]@H.

Though NMF is often used for image classification, with input data consisting of many images, here we will have only one image,

and we’ll use NMF to compress it, not do classification. We first obtain A, then call nmf(), then plot the product of the factors:

> library(pixmap)
> a <- [email protected]
> aout <- nmf(a,50)
> w <- [email protected]@W
> h <- [email protected]@H
> approxa <- w %*% h
# brightness values must be in [0,1]
> approxa <- pmin(approxa,1)
> mtrnew <- mtr
> [email protected] <- approxa
> plot(mtrnew)


The result is

This is understandably blurry. The original matrix has dimension 194 x 259, and thus presumably has rank 194. (This is confirmed for instance by running the function rankMatrix() in the Matrix package.) We’ve approximated the matrix by one of rank only 50, with a 75% storage savings. This is not important for one small picture, but possibly worthwhile if we have many large ones. The approximation is not bad in that light, and may be good enough for image recognition or other applications.

Indeed, in many if not most applications of NMF, we need to worry about overfitting, which in this context amounts to using too high a value for our rank, something to be avoided.