Matrix factorizations and social network graph analysis

December 9, 2013
By

(This article was first published on Cartesian Faith » R, and kindly contributed to R-bloggers)

This is a lecture post for my students in the CUNY MS Data Analytics program. In this series of lectures I discuss mathematical concepts from different perspectives. The goal is to ask questions and challenge standard ways of thinking about what are generally considered basic concepts. Consequently these lectures will not always be as rigorous as they could be.

The LU decomposition

Matrix factorizations along with spectral theory are two areas of linear algebra that are heavily used in practice. While there are numerous types of factorizations, it is more important to understand the basic principles of factorization and how to interpret the factorized matrices versus learning the mechanics of each type of factorization. For this lecture we’ll explore the significance of one of the simplest factorizations, the LU factorization, and how it differs from eigenvalue problems. We’ll then apply this knowledge to the problem of centrality in a social network.

The concept behind the LU factorization is fairly simple. A paraphrased definition is that given a matrix, it is sometimes possible to represent the matrix as the product of a lower triangular matrix and an upper triangular matrix. As discussed in Section 5.1 of Kuttler this is only possible if the original matrix can be transformed without switching any of the rows. If the matrix is square, it turns out that this restriction isn’t so restrictive since a permutation matrix can be left-multiplied to the original matrix which then makes it decomposable.

Kuttler introduces the multiplier procedure immediately, but I think it’s illustrative to develop some intuition first. Here is a modified example from Wikipedia, which hopefully is easier to understand. As with many mathematical problems, it is usually easier to start with the solution and work backwards to the problem. In this case we can begin by looking at the general case of a matrix that has been factorized into the product of lower and upper triangular matrices.

\begin{bmatrix}  4 & 3 \\  6 & 4  \end{bmatrix} = \begin{bmatrix}   l_{11} & 0\\  l_{21} & l_{22}  \end{bmatrix} \begin{bmatrix}   u_{11} & u_{12}\\  0 & u_{22}  \end{bmatrix}   = \begin{bmatrix}   l_{11} u_{11} & l_{11} u_{12} \\  l_{21} u_{11} & l_{21} u_{12} + l_{22} u_{22}  \end{bmatrix}

Solving for the unknowns yields \begin{array}{rl}  l_{11} u_{11} &= 4 \\  l_{21} u_{11} &= 6 \\  l_{11} u_{12} &= 3 \\  l_{21} u_{12} + l_{22} u_{22} &= 4  \end{array} . Since this system is underdetermined, there are an infinite number of solutions. Remember from the last post that the solution set is a line or plane (depending on the dimensions) when underdetermined. The key to making it determined is not so different from when we removed an equation from our system to make it underdetermined. In fact it is the opposite operation, i.e. we want to choose some values for some of the variables to make the solution set unique. This is why the diagonal of L is composed of all ones, which reduces to

\begin{array}{rl}  u_{11} &= 4 \\  l_{21} u_{11} &= 6 \\  u_{12} &= 3 \\  l_{21} u_{12} + u_{22} &= 4  \end{array} .

Therefore L = \begin{bmatrix}1 & 0 \\ 1.5 & 1\end{bmatrix} and U = \begin{bmatrix}4 & 3 \\ 0 & -0.5\end{bmatrix}

In R we don’t have to go through this mechanical process as the multiplier algorithm credited to Dolittle is already implemented for us.

 
install.packages('matrixcalc')
library(matrixcalc)
A <- matrix(c(4,6,3,4), nrow=2)
lu.decomposition(A)

$L
     [,1] [,2]
[1,]  1.0    0
[2,]  1.5    1

$U
     [,1] [,2]
[1,]    4  3.0
[2,]    0 -0.5

The LU factorization is useful primarily for mechanical reasons. What I mean by this is that the resultant L and U matrices don’t really contain any underlying meaning. Rather, they are constructed in order to simplify other operations, such as solving a system of linear equations or inverting a matrix. This is analogous in spirit to mechanical tricks like integration by parts in the sense that there is no hidden meaning present in the operation. Mind you this isn’t true of all factorizations. In cases like the singular value decomposition, the resultant matrices can contain information and is thus open to interpretation. One reason this is the case with SVD is that ultimately it is an eigenvalue method.

Exercises

  • Construct a matrix where the LU decomposition fails
  • Construct the permutation matrix that allows the product to have an LU decomposition

Eigenvalues and eigenvectors

In college there was a professor who claimed that eigenvalues were the most significant result of linear algebra. It took me a long time to appreciate why he said that, and the reason is likely due to the interpretation component. It is easy to get caught up in all the mechanical procedures and algorithms of linear algebra that we forget to think about what these concepts signify. So let’s take a look at eigenvalues and try to understand what they represent. Mechanically the concept of eigenvalues is simple: they are the scalar values \lambda that solve the equation A\vec{x} = \lambda x. What does this actually mean? Once again let’s start with a geometric interpretation. Consider the matrix A = \begin{bmatrix}2 & 2 & -2 \\ 1 & 3 & -1 \\ -1 & 1 & 1\end{bmatrix} , which comes from Example 7.1.4 of Kuttler. The eigenvectors are all the vectors whose direction when applied to a given matrix is a scaled version of itself. The fact that there are only three such vectors in an infinite space hints at the importance of the eigenvectors. Conversely, there are infinite number of LU decompositions, so there isn’t anything particularly interesting between one decomposition and another.

A <- matrix(c(2,2,-2, 1,3,-1, -1,1,1), nrow=3, byrow=TRUE)
es <- eigen(A)
vs <- es$vectors

open3d()
lines3d(rbind(0,vs[,1]), col='green')
lines3d(rbind(0,vs[,2]), col='red')
lines3d(rbind(0,vs[,3]), col='blue')
bbox3d(color=c("#333377","black"), emission="#333377",
  specular="#3333FF", shininess=5, alpha=0.8)

(Note this turns out to be a singular matrix, so the result is not well-behaved. I will try to update the matrix and re-upload images.)

eigenvectors

Now let’s see if this is true. It certainly looks like the eigenvectors are scaled copies of themselves.

scaled eigenvalues

If the eigenvalues are all distinct, then the eigenvectors are linearly independent. The significance of this is that every other vector in the space can be defined as a linear combination of these eigenvectors. Not only that but the corresponding eigenvalues can tell us how much of a contribution each eigenvector has to the original matrix. This observation is the foundation for principal component analysis as well as the singular value decomposition.

Exercises

  • Reproduce the values of the scaled eigenvectors
  • Why is the blue eigenvector not scaled in the graph?

Social network centrality

How do we go about interpreting eigenvalues and eigenvectors? As an example let’s look at network graph analysis and the concept of eigenvector centrality. A variation of this concept was used in the now infamous Google PageRank. The general premise is that by constructing an adjacency matrix that models an edge between two nodes as a 1, it is possible to create a measure based on the largest eigenvalue. Okay so let’s set up the problem.

Let A = \begin{bmatrix}1 & 0 & 1 \\ 0 & 1 & 1 \\ 1 & 1 & 1\end{bmatrix} represent the adjacency matrix of a graph G. This says user 1 links to itself and also to user 3. Similarly user 2 links to itself and user 3, while user 3 links to everyone. The vertex centrality measure is defined as

\begin{array}{rl}  x_v &= \frac{1}{\lambda} \sum_{t \in M(V)} x_t \\  &= \frac{1}{\lambda} \sum_{t \in G} a_{v,t} x_t \\  \\  \lambda \vec{x} &= \begin{bmatrix}\vec{a}_{1,} \cdot \vec{x} \\  \vdots \\  \vec{a}_{v,} \cdot \vec{x} \end{bmatrix} \\  &= A \vec{x}  \end{array} .

We want to find an eigenvector whose components are all positive. Since the adjacency matrix is square and non-negative, we have more tools to work with. For a strongly connected graph, we are guaranteed such an eigenvector corresponding to the largest eigenvalue thanks to the Perron–Frobenius theorem. The literature suggests using the power iteration method to find this eigenvalue. This approach (see here and here) is based on the observation that successive multiplication of a vector with a matrix A will amplify the principal component to a magnitude significantly greater than the other eigenvectors. Through this process the corresponding eigenvalue can be found. Hence the power method is iterative, where each iteration is defined by \begin{array}{rl}  q_k &= A x_{k-1} \\  x_k &= \frac{q_k}{\lVert q_k \rVert} \end{array} . Note that we can only find the largest eigenvalue via this method.

library(devtools)
install_github('lambda.tools','zatonovo')
library(lambda.tools)
A <- matrix(c(1,0,1, 0,1,1, 1,1,1), nrow=3, byrow=TRUE)
b <- matrix(rep(A,5), nrow=3)
x <- foldblock(b, 3, function(A, r) { q <- A %*% r; q / norm(q) }, rep(1,3))

A %*% x / x
         [,1]
[1,] 2.414214
[2,] 2.414214
[3,] 2.414214

Let’s see if using this eigenvector intuitively makes sense to us. The first thing to note is that the eigenvector with the largest eigenvalue provides the largest contribution of the linear combinations constructed. So within this vector the claim is that the coefficients of the vector providing a measure of centrality is reasonable. This implies that the eigenvector is being used as a proxy for an ordering function. Conceptually this makes sense since the coefficients of the eigenvector indicate the magnitude of the projection onto each axis. Thinking about the norm of a vector, then the largest coefficient is the largest contributor to the norm.

What does the eigenvector look like? It’s telling us that user 3 has the greatest centrality and that users 1 and 2 have the same centrality. This makes sense since in the original adjacency matrix, both user 1 and 2 have 2 links while user 3 has 3.

> x
          [,1]
[1,] 0.2928870
[2,] 0.2928870
[3,] 0.4142259

To do this for real would require pulling data for a social network and doing the encoding.

Exercises

  • Modify the adjacency matrix so that users 1 and 2 no longer have the same centrality.
  • Create an adjacency matrix for 20 users and calculate the largest eigenvector
  • Change the number of iterations from 5 to 10 in the power method. Is there a significant difference? What is the right number of iterations to use?

To leave a comment for the author, please follow the link and comment on his blog: Cartesian Faith » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.