Data Analysis Lectures

[This article was first published on Reimagined Invention, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

This is the 1st theoretical post I’m writing on this blog. I’ll review some important Linear Algebra concepts that I’m using at work and these will save you a lot of time if you work with large datasets as I do.

  • Diagonalization
  • Cayley-Hamilton Theorem
  • Time saving steps in R

Diagonalization

Let \(\renewcommand{\vec}[1]{\boldsymbol{#1}} \newcommand{\R}{\mathbb{R}} A=

\begin{bmatrix}5&0\\0&3\end{bmatrix}
\), then \(A^2=
\begin{bmatrix}5&0\\0&3\end{bmatrix}
\begin{bmatrix}5&0\\0&3\end{bmatrix}
=
\begin{bmatrix}5^2&0\\0&3^2\end{bmatrix}
\)
and \(A^3 = AA^2 =
\begin{bmatrix}5&0\\0&3\end{bmatrix}
\begin{bmatrix}5^2&0\\0&3^2\end{bmatrix}
=
\begin{bmatrix}5^3&0\\0&3^3\end{bmatrix}
\)
$

In general,

$$A^k = \begin{bmatrix}5^k&0\\0&3^k\end{bmatrix} \: \text{for} \: k\geq 1.$$

If \(A\) can be displayed in a useful factorization of the form \(A = PDP^{-1}\), with \(D\) a diagonal matrix (\(d_{ij} = 0\) if \(i\neq j\)) and \(P\) invertible, then is easy to obtain \(A^k\).

Remember than an eigenvector of \(A\) is a vector \(\vec{v}\in \R^n\) such that \(A\vec{v} = \lambda \vec{v}\) with \(\lambda\) being an eigenvalue of \(A\).

Theorem (Diagonalization Theorem). Let \(A\) an \(n\times n\) matrix. \(A\) is diagonalizable if and only if \(A\) has \(n\) linearly independent eigenvectors. In fact, \(A=PDP^{-1}\), with \(D\) a diagonal matrix and \(P\) an invertible matrix, if and only if the columns of \(P\) are \(n\) linearly independent vectors of \(A\). In this case, the diagonal entries of \(D\) are eigenvalues of \(A\) that correspond to the eigenvectors in \(P\).

Proof. Let \(P \in \R^{n\times n}\) being a matrix of eigenvectors of \(A\) and \(D\) being any diagonal matrix of eigenvalues, then

$$AP = A[\vec{v_1} \: \vec{v_2} \: \vec{v_3} \: \ldots \: \vec{v_n}] = [A\vec{v_1} \: A\vec{v_2} \: A\vec{v_3} \: \ldots \: A\vec{v_n}]$$

and

$$PD = P\begin{bmatrix}\lambda_1 & 0 & \ldots & 0\\0 & \lambda_2 & \ldots & 0\\ \vdots & \vdots & \ddots & \vdots\\0 & 0 & \ldots & \lambda_n\end{bmatrix} = [\lambda_1\vec{v_1} \: \lambda_2\vec{v_2} \: \ldots \: \lambda_n\vec{v_n}].$$

Suppose that \(A\) is diagonalizable, then \(A = PDP^{-1}\) and by multiplying by \(P\) I obtain \(AP = PD\). By replacing the obtained values of \(AP\) and \(PD\) y obtain

$$A\vec{v_1} = \lambda_1 \vec{v_1}, \: A\vec{v_2} = \lambda_2 \vec{v_2},\: \ldots,\: A\vec{v_n} = \lambda_n \vec{v_n}$$

Given that \(P\) is invertible, its columns \(\vec{v_1},\ldots,\vec{v_n}\) must be linearly independent. Those columns are also different from \(\vec{0}\), so \(\lambda_1,\ldots,\lambda_n\) are eigenvalues and \(\vec{v_1}, \ldots, \vec{v_n}\) are eigenvectors.

Given any \(n\) eigenvectors of \(A\), let \(P = [\vec{v_1} \: \vec{v_2} \: \ldots \vec{v_n}]\) and \(d_{ij} = \lambda_i\) for \(i = j\) and \(d_{ij} = 0\) in other case. Then \(AP = PD\) is true without imposing conditions on the eigenvectors. If the eigenvectors are linearly independent, then \(P\) is invertible and \(AP = PD\) implies \(A = PDP^{-1}\).

Cayley-Hamilton Theorem

Statement and Proof

Let \(A\in \R^{n\times n}\) the characteristic polynomial of \(A\) is defined as

$$P_A(\lambda) = \text{det}(\lambda I – A),\lambda \in \R.$$

Theorem (Cayley-Hamilton, 1858). Every square matrix is a root of its own characteristic polynomial.

Proof. Let \(A\in \R^{n\times n}\). The characteristic polynomial of \(A\) is a polinomial of \(\lambda\), its degree is \(n\), and can be written as

$$ \begin{gather}\tag{*} \text{det} (\lambda I -A) = a_n\lambda^n + a_{n-1}\lambda^{n-1} + a_{n-2}\lambda^{n-2} + a_{n-3}\lambda^{n-3} + \ldots + a_1 \lambda + a_0. \end{gather} $$

Consider the adjoint matrix of \((\lambda I – A)\). By definition of the adjoint, this will be a matrix whose components are monic polinomials of \(\lambda\) and its highest degree is \(n-1\).

Hence, it is possible to write the adjoint as

$$ \begin{gather}\tag{**} \text{adj}(\lambda I – A) = \lambda^{n-1}B_{n-1} + \lambda^{n-2} B_{n-2} + \lambda^{n-3} B_{n-3} + \ldots + \lambda B_1 + B_0. \end{gather} $$

Where \(B_j \in \R^{n\times n}\) and each \(B_j\) is a constant matrix.

By using adjoint properties I have

$$(\lambda I – A)^t \text{adj}(\lambda I – A) = \text{det}(\lambda I – A)I.$$

By replacing this equation \((*)\) and \((**)\) I obtain

$$(\lambda I – A) ^t \left(\sum_{i=1}^n \lambda^{n-i}B_{n-1}\right) = \left(\sum_{i=0}^n a_{n-i}\lambda^{n-i}\right)I.$$

Hence,

$$(\lambda^nB_{n-1}-\lambda^{n-1}AB_{n-1}) + \ldots + (\lambda B_0 – AB_0) = a_n \lambda^n I + \ldots + a_0 I.$$

By quating similar terms

$$ \begin{array}{llll} B_{n-1} = a_n I & \text{ multiply by } A^n & \rightarrow & A^n B_{n-1} = a_n A^n \cr B_{n-2} – AB_{n-1} = a_{n-1} I & \text{ multiply by } A^{n-1} & \rightarrow & A^{n-1} B_{n-2} – A^n B_{n-1} = a_{n-1} A^{n-1} \cr B_{n-3} – AB_{n-2} = a_{n-2} I & \text{ multiply by } A^{n-2} & \rightarrow & A^{n-2} B_{n-3} – A^{n-1} B_{n-2} = a_{n-2} A^{n-2} \cr B_{n-4} – AB_{n-3} = a_{n-3} I & \text{ multiply by } A^{n-3} & \rightarrow & A^{n-3} B_{n-4} – A^{n-2} B_{n-3} = a_{n-3} A^{n-3} \cr & & \vdots & \cr B_0 – AB_1 = a_1 & \text{ multiply by } A & \rightarrow & A B_0 – A^2 B_1 = a_{1} A \cr – AB_0 = a_0 I & \text{ multiply by } I & \rightarrow & -A B_0 = a_{0} I. \cr \end{array} $$

By summing these last equations I obtain

$$a_n A^n + a_{n-1} A^{n-1} + \ldots + a_1 A + a_0 I = 0$$

which concludes the proof.

Application: Obtain the inverse of a matrix

Cayley-Hamilton theorem can be used, among other things, to obtain the inverse of a matrix. To do that I propose three steps.

1st step. Obtain the characteristic polinomial of \(A\) that is defined as

$$p(\lambda)=\text{det}(\lambda I – A).$$

2nd step. Equate the characteristic polinomial to zero and replace \(\lambda\) by \(A\) and attach \(I\) to the terms where there is no replacement. Example:

$$p(\lambda)=\lambda^2 + 2\lambda + 1 = 0 \:\rightarrow\: p(A)=A^2 + 2A + I = 0.$$

3rd step. Solve for \(I\) and remember that \(AA^{-1}=I\). Continuing the last example:

$$I= A(-A-2I) \:\rightarrow\: A^{-1}=-(A+2I).$$

Examples

2×2 matrix

Consider the matrix

$$A= \begin{bmatrix} 1&2\\ 3&4 \end{bmatrix}. $$

To obtain the inverse of \(A\) these are the steps:

1st step. Obtain the characteristic polinomial of \(A\) and equate to zero

$$p(A)= \left( \begin{array}{cc} \lambda-1 & -2 \\ -3 & \lambda-4 \end{array} \right)=(\lambda-1)(\lambda-4)-6. $$

2nd step. Take the last result and transform by replacing \(\lambda\) by \(A\)

$$p(\lambda)=0 \rightarrow p(A)=(A-I)(A-4I)-6I = A^2-5A-2I = 0.$$

3rd step. Factorize by \(A\) in the equation \(A^2-5A-2I = 0\) and solve for \(I\)

$$\frac{1}{2}A(A-5I) = I.$$

As \(AA^{-1}=I\) I obtain \(A^{-1} = \frac{1}{2} (A-5I)\) and then

$$A^{-1} = \begin{bmatrix} -2 & 1 \\ 3/2 & -1/2 \end{bmatrix}. $$

4×4 matrix

Here the method proves to be useful.

Consider the matrix

$$ A=\begin{bmatrix} 1 & 0 & 0 & 0 \\ 2 & 1 & 0 & 0 \\ 3 & 2 & 1 & 0 \\ 4 & 3 & 2 & 1 \end{bmatrix}. $$

Following the last steps I obtain

$$p(\lambda)=\text{det}(\lambda I – A) = \left| \begin{array}{cccc} \lambda – 1 & 0 & 0 & 0 \\ -2 & \lambda – 1 & 0 & 0 \\ -3 & -2 & \lambda – 1 & 0 \\ -4 & -3 & -2 & \lambda – 1 \end{array} \right|= (\lambda-1)^4.$$

By using \(\displaystyle(x+y)^n = \sum_{k=0}^n \frac{n!}{(n-k)!k!} \vec{x}^{n-k}\vec{y}^k\) I obtain

$$\text{det}(\lambda I – A) = \lambda^4 -4\lambda^2 + 6\lambda^2 – 4\lambda +1.$$

By doing the replacements I obtain

$$p(\lambda) = 0 \rightarrow p(A)=A^4-4A^3+6A^2-4A+I=0 \rightarrow A(-A^3+4A^2-6A+4I)=I$$

and then \(A^{-1}=-A^3+4A^2-6A+4I\).

Now I obtain the matrix that belongs to the obtained polinomial

$$ -A^3= \begin{bmatrix} -1 & 0 & 0 & 0 \\ -6 & -1 & 0 & 0 \\ -21 & -6 & -1 & 0 \\ -56 & -21 & -6 & -1 \end{bmatrix} \: 4A^2= \begin{bmatrix} 4 & 0 & 0 & 0 \\ 16 & 4 & 0 & 0 \\ 40 & 16 & 4 & 0 \\ 80 & 40 & 16 & 4 \end{bmatrix} \: -6A= \begin{bmatrix} -6 & 0 & 0 & 0 \\ -12 & -6 & 0 & 0 \\ -18 & -12 & -6 & 0 \\ -24 & -18 & -12 & -6 \end{bmatrix}. $$

And by replacing the obtained matrix I replace in the obtained polinomial and I have the inverse of \(A\)

$$A^{-1}= \begin{bmatrix} 1 & 0 & 0 & 0 \\ -2 & -1 & 0 & 0 \\ 1 & -2 & 1 & 0 \\ 0 & 1 & -2 & 1 \end{bmatrix}. $$

Time saving steps in R

A <- rbind(c(5,0,0,0),c(0,5,0,0),c(1,4,-3,0),c(-1,2,0,-3))
P <- eigen(A)$vectors
D <- diag(eigen(A)$values)

ptm <- proc.time()
A%*%A

##      [,1] [,2] [,3] [,4]
## [1,]   25    0    0    0
## [2,]    0   25    0    0
## [3,]    2    8    9    0
## [4,]   -2    4    0    9

proc.time() - ptm

##    user  system elapsed 
##   0.003   0.000   0.003

ptm <- proc.time()
P%*%(D%*%D)%*%solve(P)

##      [,1] [,2] [,3] [,4]
## [1,]   25    0    0    0
## [2,]    0   25    0    0
## [3,]    2    8    9    0
## [4,]   -2    4    0    9

proc.time() - ptm

##    user  system elapsed 
##   0.003   0.001   0.007

Final comment

For a small matrix this can be tedious but for a large matrix consider that in some cases you can diagonalize and save a lot of memory if you are using, for example, R o Matlab and you need a really large correlation matrix.

To leave a comment for the author, please follow the link and comment on their blog: Reimagined Invention.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)