Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I just finished covering a few numerical techniques for solving systems of equations, which can be applied to find best-fit lines through a give set of data points.

The four points $\{(0,0), (1,3), (2,3), (5,6)\}$ are arranged into an inconsistent system of four equations and two unknowns:

$b+a(0) = 0 \\ b+a(1) = 3 \\ b+a(2) = 3 \\ b+a(5) = 6$

The system can be represented in matrix form:

$\begin{bmatrix} 1 & 0 \\ 1 & 1 \\ 1 & 2 \\ 1 & 5 \end{bmatrix} \begin{bmatrix} b \\ a \end{bmatrix} = \begin{bmatrix} 1 \\ 3 \\ 3 \\ 6 \end{bmatrix}$

The least-squares solution vector can be found by solving the normal equations

$A^TA\vec{x_*}=A^T\vec{b}$

Solving for $A^TA$ and $A^T \vec{b}$ yields the following matrix system:

$\begin{bmatrix} 4 & 8 \\ 8 & 30 \end{bmatrix} \begin{bmatrix} b \\ a \end{bmatrix} = \begin{bmatrix} 12 \\ 39 \end{bmatrix}$

The system can be solved using Gauss-Seidel method.

Below, I run 10 iterations of Gauss-Seidel (visualized in the figure above). The estimated line gets successively closer to the true solution in green. The estimates are shown in blue – each iteration is shown in a darker shade than the next (see highlighted lines).

library(MASS) # package needed for ginv() function

A=cbind(c(4,8),c(8,30)) # coefficient matrix
b<-t(t(c(12,39))) # b-vector
x0<-t(t(c(0,0)))  # starting vector
iter<-10          # number of Gauss-Seidel iterations to run

L<-lower.tri(A)*A  # lower triang. A
U<-upper.tri(A)*A  # upper triang. A
D<-diag(diag(A))   # diag of A

# plot points
xc<-c(0,1,2,5)
yc<-c(0,3,3,6)
plot(xc,yc,
col='red',
xlab="X-Values",
ylab="Y-Values",
bty='n')
title(main="OLS Estimates")
legend("bottomright",
c("Data","Estimates","True Line"),
lty=c(0,1,1),
pch=c(1,NA,NA),
col=c("red","blue","green"),
bty='n')

# create color palette - lines will get darker with each iter
pal<-colorRampPalette(c("#f2f2f2", "Blue"))
colors<-pal(iter) # creates color palette of length(iter)

# plot true line
abline(a=1.0714,b=.8571,col='green')

n<-1
while(n<=iter){
print(x0)

# Gauss-Seidel formula
x1<-(ginv((L+D)))%*%((-U%*%x0)+b)
x0<-x1
n=n+1

# plot estimated line
abline(a=as.numeric(x0[2,1]), # slope of estimated line
b=as.numeric(x0[1,1]), # y-intercept of estimated line
col=colors[n]) # pick nth color in palette
}