**Reimagined Invention**, and kindly contributed to R-bloggers)

This is the 1st theoretical post I’m writing on this blog. I’ll review some important Linear Algebra concepts that I’m using at work and these will save you a lot of time if you work with large datasets as I do.

- Diagonalization
- Cayley-Hamilton Theorem
- Time saving steps in R

# Diagonalization

Let \(\renewcommand{\vec}[1]{\boldsymbol{#1}} \newcommand{\R}{\mathbb{R}} A=

\), then \(A^2=

=

\) and \(A^3 = AA^2 =

=

\)$

In general,

If \(A\) can be displayed in a useful factorization of the form \(A = PDP^{-1}\), with \(D\) a diagonal matrix (\(d_{ij} = 0\) if \(i\neq j\)) and \(P\) invertible, then is easy to obtain \(A^k\).

Remember than an eigenvector of \(A\) is a vector \(\vec{v}\in \R^n\) such that \(A\vec{v} = \lambda \vec{v}\) with \(\lambda\) being an eigenvalue of \(A\).

*Theorem (Diagonalization Theorem).* Let \(A\) an \(n\times n\) matrix. \(A\) is diagonalizable if and only if \(A\) has \(n\) linearly independent eigenvectors.

In fact, \(A=PDP^{-1}\), with \(D\) a diagonal matrix and \(P\) an invertible matrix, if and only if the columns of \(P\) are \(n\) linearly independent vectors of \(A\). In this case, the diagonal entries of \(D\) are eigenvalues of \(A\) that correspond to the eigenvectors in \(P\).

*Proof.* Let \(P \in \R^{n\times n}\) being a matrix of eigenvectors of \(A\) and \(D\) being any diagonal matrix of eigenvalues, then

and

\vdots & \vdots & \ddots & \vdots\\0 & 0 & \ldots & \lambda_n\end{bmatrix} =

[\lambda_1\vec{v_1} \: \lambda_2\vec{v_2} \: \ldots \: \lambda_n\vec{v_n}].$$

Suppose that \(A\) is diagonalizable, then \(A = PDP^{-1}\) and by multiplying by \(P\) I obtain \(AP = PD\). By replacing the obtained values of \(AP\) and \(PD\) y obtain

Given that \(P\) is invertible, its columns \(\vec{v_1},\ldots,\vec{v_n}\) must be linearly independent. Those columns are also different from \(\vec{0}\), so \(\lambda_1,\ldots,\lambda_n\) are eigenvalues and \(\vec{v_1}, \ldots, \vec{v_n}\) are eigenvectors.

Given any \(n\) eigenvectors of \(A\), let \(P = [\vec{v_1} \: \vec{v_2} \: \ldots \vec{v_n}]\) and \(d_{ij} = \lambda_i\) for \(i = j\) and \(d_{ij} = 0\) in other case. Then \(AP = PD\) is true without imposing conditions on the eigenvectors. If the eigenvectors are linearly independent, then \(P\) is invertible and \(AP = PD\) implies \(A = PDP^{-1}\).

# Cayley-Hamilton Theorem

## Statement and Proof

Let \(A\in \R^{n\times n}\) the characteristic polynomial of \(A\) is defined as

*Theorem (Cayley-Hamilton, 1858).* Every square matrix is a root of its own characteristic polynomial.

*Proof.* Let \(A\in \R^{n\times n}\). The characteristic polynomial of \(A\) is a polinomial of \(\lambda\), its degree is \(n\), and can be written as

\begin{gather}\tag{*}

\text{det} (\lambda I -A) = a_n\lambda^n + a_{n-1}\lambda^{n-1} + a_{n-2}\lambda^{n-2} + a_{n-3}\lambda^{n-3} + \ldots + a_1 \lambda + a_0.

\end{gather}

$$

Consider the adjoint matrix of \((\lambda I – A)\). By definition of the adjoint, this will be a matrix whose components are monic polinomials of \(\lambda\) and its highest degree is \(n-1\).

Hence, it is possible to write the adjoint as

\begin{gather}\tag{**}

\text{adj}(\lambda I – A) = \lambda^{n-1}B_{n-1} + \lambda^{n-2} B_{n-2} + \lambda^{n-3} B_{n-3} + \ldots + \lambda B_1 + B_0.

\end{gather}

$$

Where \(B_j \in \R^{n\times n}\) and each \(B_j\) is a constant matrix.

By using adjoint properties I have

By replacing this equation \((*)\) and \((**)\) I obtain

Hence,

By quating similar terms

\begin{array}{llll}

B_{n-1} = a_n I & \text{ multiply by } A^n & \rightarrow & A^n B_{n-1} = a_n A^n \cr

B_{n-2} – AB_{n-1} = a_{n-1} I & \text{ multiply by } A^{n-1} & \rightarrow & A^{n-1} B_{n-2} – A^n B_{n-1} = a_{n-1} A^{n-1} \cr

B_{n-3} – AB_{n-2} = a_{n-2} I & \text{ multiply by } A^{n-2} & \rightarrow & A^{n-2} B_{n-3} – A^{n-1} B_{n-2} = a_{n-2} A^{n-2} \cr

B_{n-4} – AB_{n-3} = a_{n-3} I & \text{ multiply by } A^{n-3} & \rightarrow & A^{n-3} B_{n-4} – A^{n-2} B_{n-3} = a_{n-3} A^{n-3} \cr

& & \vdots & \cr

B_0 – AB_1 = a_1 & \text{ multiply by } A & \rightarrow & A B_0 – A^2 B_1 = a_{1} A \cr

– AB_0 = a_0 I & \text{ multiply by } I & \rightarrow & -A B_0 = a_{0} I. \cr

\end{array}

$$

By summing these last equations I obtain

which concludes the proof.

## Application: Obtain the inverse of a matrix

Cayley-Hamilton theorem can be used, among other things, to obtain the inverse of a matrix. To do that I propose three steps.

*1st step.* Obtain the characteristic polinomial of \(A\) that is defined as

*2nd step.* Equate the characteristic polinomial to zero and replace \(\lambda\) by \(A\) and attach \(I\) to the terms where there is no replacement. Example:

*3rd step.* Solve for \(I\) and remember that \(AA^{-1}=I\). Continuing the last example:

## Examples

### 2×2 matrix

Consider the matrix

\begin{bmatrix}

1&2\\

3&4

\end{bmatrix}.

$$

To obtain the inverse of \(A\) these are the steps:

*1st step.* Obtain the characteristic polinomial of \(A\) and equate to zero

\left(

\begin{array}{cc}

\lambda-1 & -2 \\

-3 & \lambda-4

\end{array}

\right)=(\lambda-1)(\lambda-4)-6.

$$

*2nd step.* Take the last result and transform by replacing \(\lambda\) by \(A\)

*3rd step.* Factorize by \(A\) in the equation \(A^2-5A-2I = 0\) and solve for \(I\)

As \(AA^{-1}=I\) I obtain \(A^{-1} = \frac{1}{2} (A-5I)\) and then

\begin{bmatrix}

-2 & 1 \\

3/2 & -1/2

\end{bmatrix}.

$$

### 4×4 matrix

Here the method proves to be useful.

Consider the matrix

A=\begin{bmatrix}

1 & 0 & 0 & 0 \\

2 & 1 & 0 & 0 \\

3 & 2 & 1 & 0 \\

4 & 3 & 2 & 1

\end{bmatrix}.

$$

Following the last steps I obtain

\left|

\begin{array}{cccc}

\lambda – 1 & 0 & 0 & 0 \\

-2 & \lambda – 1 & 0 & 0 \\

-3 & -2 & \lambda – 1 & 0 \\

-4 & -3 & -2 & \lambda – 1

\end{array}

\right|=

(\lambda-1)^4.$$

By using \(\displaystyle(x+y)^n = \sum_{k=0}^n \frac{n!}{(n-k)!k!} \vec{x}^{n-k}\vec{y}^k\) I obtain

By doing the replacements I obtain

and then \(A^{-1}=-A^3+4A^2-6A+4I\).

Now I obtain the matrix that belongs to the obtained polinomial

-A^3=

\begin{bmatrix}

-1 & 0 & 0 & 0 \\

-6 & -1 & 0 & 0 \\

-21 & -6 & -1 & 0 \\

-56 & -21 & -6 & -1

\end{bmatrix}

\:

4A^2=

\begin{bmatrix}

4 & 0 & 0 & 0 \\

16 & 4 & 0 & 0 \\

40 & 16 & 4 & 0 \\

80 & 40 & 16 & 4

\end{bmatrix}

\:

-6A=

\begin{bmatrix}

-6 & 0 & 0 & 0 \\

-12 & -6 & 0 & 0 \\

-18 & -12 & -6 & 0 \\

-24 & -18 & -12 & -6

\end{bmatrix}.

$$

And by replacing the obtained matrix I replace in the obtained polinomial and I have the inverse of \(A\)

\begin{bmatrix}

1 & 0 & 0 & 0 \\

-2 & -1 & 0 & 0 \\

1 & -2 & 1 & 0 \\

0 & 1 & -2 & 1

\end{bmatrix}.

$$

# Time saving steps in R

```
A <- rbind(c(5,0,0,0),c(0,5,0,0),c(1,4,-3,0),c(-1,2,0,-3))
P <- eigen(A)$vectors
D <- diag(eigen(A)$values)
ptm <- proc.time()
A%*%A
```

```
## [,1] [,2] [,3] [,4]
## [1,] 25 0 0 0
## [2,] 0 25 0 0
## [3,] 2 8 9 0
## [4,] -2 4 0 9
```

```
proc.time() - ptm
```

```
## user system elapsed
## 0.003 0.000 0.003
```

```
ptm <- proc.time()
P%*%(D%*%D)%*%solve(P)
```

```
## [,1] [,2] [,3] [,4]
## [1,] 25 0 0 0
## [2,] 0 25 0 0
## [3,] 2 8 9 0
## [4,] -2 4 0 9
```

```
proc.time() - ptm
```

```
## user system elapsed
## 0.003 0.001 0.007
```

# Final comment

For a small matrix this can be tedious but for a large matrix consider that in some cases you can diagonalize and save a lot of memory if you are using, for example, R o Matlab and you need a really large correlation matrix.

**leave a comment**for the author, please follow the link and comment on their blog:

**Reimagined Invention**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...