Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Regression Basics
y= b0 + b1 *X  ‘regression line we want to fit’
The method of least squares minimizes the squared distance between the line ‘y’ and

individual data observations yi

That is minimize: ∑ ei2 = ∑ (yi – b0 –  b1 Xi )2   with respect to b0 and  b1 .
This can be accomplished by taking the partial derivatives of  ∑ ei2 with respect to each coefficient and setting it equal to zero.
∂ ∑ ei2 / ∂ b0 =  2 ∑ (yi – b0 –  b1 Xi )  (-1) = 0
∂ ∑ ei2 / ∂ b1 =   2 ∑(yi – b0 –  b1 Xi )  (-Xi) = 0
Solving for b0 and  b1 yields the ‘formulas’ for hand calculating the estimates:
b0 = ybar – b1 Xbar
b1 = ∑ (( Xi – Xbar)  (yi – ybar)) / ∑ ( Xi – Xbar) =  [ ∑Xi Yi  – n xbar*ybar] / [∑X2 – n Xbar2
=   S( X,y) / SS(X)

Example with Real Data:
Given real data, we can use the formulas above to derive (by hand /caclulator/excel) the estimated values for b0 and b1, which give us the line of best fit, minimizing  ∑ ei2 = ∑ (yi – b0 –  b1 Xi )2  .
n= 5
∑Xi Yi   = 146
∑X2  = 55
Xbar = 3
Ybar =8
b1 =  [ ∑Xi Yi  – n xbar*ybar] / [∑X2 – n Xbar2]    (146-5*3*8)/(55-5*32) = 26/10 = 2.6
b0= ybar – b1 Xbar  = 8-2.6*3 = .20
You can verify these results in PROC REG in SAS.
/* GENEARATE DATA */
DATA REGDAT;
INPUT X Y;
CARDS;
1 3
2 7
3 5
4 11
5 14
;
RUN;
/* BASIC REGRESSION WITH PROC REG */
PROC REG DATA = REGDAT;
MODEL Y = X;
RUN;
Similarly this can be done in R using the ‘lm’ function:
```#------------------------------------------------------------
#  regression with canned lm routine
#------------------------------------------------------------

x <- c(1,2,3,4,5) # read in x -values

y <- c(3,7,5,11,14) # read in y-values

data1 <- data.frame(x,y) # create data set combining x and y values

# analysis

plot(data1\$x, data1\$y) # plot data
reg1 <- lm(data1\$y~data1\$x) # compute regression estimates
summary(reg1)              # print regression output
abline(reg1)               # plot fitted regression line```

Regression Matrices
Alternatively, this problem can be represented in matrix format.
We can then formulate the least squares equation as:
y = Xb

where the ‘errors’  or deviations from the fitted line can be formulated by the matrix :
e = (y – Xb)
The matrix equivalent of ∑ ei2  becomes (y - Xb)’ (y - Xb) = e’e
= (y - Xb)’ (y - Xb) = y’y - 2 * b’X’y + b’X’Xb
Taking partials, setting = 0, and solving for  b   gives:
d e’e / d b = -2 * X’y +2* X’Xb = 0
2 X’Xb =   2 X’y
X’Xb = X’y
b = (X’X)-1  X’y   which is the matrix equivalent to what we had before:
[ ∑Xi Yi  – n xbar*ybar] / [∑X2 – n Xbar2]  =   S( X,y) / SS(X)

These computations can be carried out in SAS via PROC IML commands:

/* MATRIX REGRESSION */
PROC IML;
/* INPUT DATA AS VECTORS */
yt = {3 7 5 11 14} ; /* TRANSPOSED Y VECTOR */
x0t = j(1,5,1); /* ROW VECTOR OF 1'S */
x1t = {1 2 3 4 5}; /* X VALUES */
xt =x0t//x1t; /* COMBINE VECTORS INTO TRANSPOSED X-MATRIX */
PRINT yt x0t x1t;
/* FORMULATE REGRESSION MATRICES */
y= yt`;     /* VECTOR OF DEPENDENT VARIABLES */
x =xt`; /* FULL X OR DESIGN MATRIX */
beta = inv(x`*x)*x`*y;  /* THE CLASSICAL REGRESSION MATRIX */
PRINT beta;
TITLE 'REGRESSION MATRICES VIA PROC IML';
QUIT;
RUN;

OUTPUT

The same results can be obtained in R as follows:

```#------------------------------------------------------------
#   matrix programming based approach
#------------------------------------------------------------

# regression matrices require a column of 1's in order to calculate
# the intercept or constant, create this column of 1's as x0

x0 <- c(1,1,1,1,1) # column of 1's
x1 <- c(1,2,3,4,5) # original x-values

# create the x- matrix of explanatory variables

x <- as.matrix(cbind(x0,x1))

# create the y-matrix of dependent variables

y <- as.matrix(c(3,7,5,11,14))

# estimate  b = (X'X)^-1 X'y

b <- solve(t(x)%*%x)%*%t(x)%*%y

print(b) # this gives the intercept and slope - matching exactly
# the results above```