Basic Econometrics in R and SAS

November 27, 2011
By

(This article was first published on Econometric Sense, and kindly contributed to R-bloggers)

Regression Basics

y= b0 + b1 *X  ‘regression line we want to fit’

The method of least squares minimizes the squared distance between the line ‘y’ and
individual data observations yi



That is minimize: ∑ ei2 = ∑ (yi - b0 -  b1 Xi )2   with respect to b0 and  b1 .
This can be accomplished by taking the partial derivatives of  ∑ ei2 with respect to each coefficient and setting it equal to zero.
∂ ∑ ei2 / ∂ b0 =  2 ∑ (yi - b0 -  b1 Xi )  (-1) = 0 
∂ ∑ ei2 / ∂ b1 =   2 ∑(yi - b0 -  b1 Xi )  (-Xi) = 0
Solving for b0 and  b1 yields the ‘formulas’ for hand calculating the estimates:
b0 = ybar - b1 Xbar
b1 = ∑ (( Xi - Xbar)  (yi - ybar)) / ∑ ( Xi - Xbar) =  [ ∑Xi Yi  – n xbar*ybar] / [∑X2 – n Xbar2
 =   S( X,y) / SS(X)
 
Example with Real Data: 

Given real data, we can use the formulas above to derive (by hand /caclulator/excel) the estimated values for b0 and b1, which give us the line of best fit, minimizing  ∑ ei2 = ∑ (yi - b0 -  b1 Xi )2  .

n= 5
∑Xi Yi   = 146
∑X2  = 55
Xbar = 3
Ybar =8

b1 =  [ ∑Xi Yi  – n xbar*ybar] / [∑X2 – n Xbar2]    (146-5*3*8)/(55-5*32) = 26/10 = 2.6
b0= ybar - b1 Xbar  = 8-2.6*3 = .20

You can verify these results in PROC REG in SAS.

/* GENEARATE DATA */

DATA REGDAT;
INPUT X Y;
CARDS;
1 3
2 7
3 5
4 11
5 14
;
RUN;

/* BASIC REGRESSION WITH PROC REG */

PROC REG DATA = REGDAT;
MODEL Y = X;
RUN;
QUIT;

OUTPUT:



Similarly this can be done in R using the 'lm' function:

#------------------------------------------------------------
# regression with canned lm routine
#------------------------------------------------------------
 
# read in data manually
 
x <- c(1,2,3,4,5) # read in x -values
 
y <- c(3,7,5,11,14) # read in y-values
 
data1 <- data.frame(x,y) # create data set combining x and y values
 
# analysis
 
plot(data1$x, data1$y) # plot data
reg1 <- lm(data1$y~data1$x) # compute regression estimates
summary(reg1) # print regression output
abline(reg1) # plot fitted regression line
Created by Pretty R at inside-R.org

 
Regression Matrices

Alternatively, this problem can be represented in matrix format. 
We can then formulate the least squares equation as:
 y = Xb 
   
where the ‘errors’  or deviations from the fitted line can be formulated by the matrix :
e = (y – Xb)

The matrix equivalent of ∑ ei2  becomes (y - Xb)’ (y - Xb) = e’e

= (y - Xb)’ (y - Xb) = y’y - 2 * b’X’y + b’X’Xb

Taking partials, setting = 0, and solving for  b   gives:

d e’e / d b = -2 * X’y +2* X’Xb = 0

2 X’Xb =   2 X’y

X’Xb = X’y

b = (X’X)-1  X’y   which is the matrix equivalent to what we had before:
[ ∑Xi Yi  – n xbar*ybar] / [∑X2 – n Xbar2]  =   S( X,y) / SS(X)
 These computations can be carried out in SAS via PROC IML commands:

/* MATRIX REGRESSION */

PROC IML;

/* INPUT DATA AS VECTORS */
yt = {3 7 5 11 14} ; /* TRANSPOSED Y VECTOR */
x0t = j(1,5,1); /* ROW VECTOR OF 1'S */
x1t = {1 2 3 4 5}; /* X VALUES */
xt =x0t//x1t; /* COMBINE VECTORS INTO TRANSPOSED X-MATRIX */

PRINT yt x0t x1t;

/* FORMULATE REGRESSION MATRICES */

y= yt`;     /* VECTOR OF DEPENDENT VARIABLES */
x =xt`; /* FULL X OR DESIGN MATRIX */
beta = inv(x`*x)*x`*y;  /* THE CLASSICAL REGRESSION MATRIX */
PRINT beta;
TITLE 'REGRESSION MATRICES VIA PROC IML';
QUIT;
RUN;

OUTPUT
The same results can be obtained in R as follows:
#------------------------------------------------------------
# matrix programming based approach
#------------------------------------------------------------
 
# regression matrices require a column of 1's in order to calculate
# the intercept or constant, create this column of 1's as x0
 
x0 <- c(1,1,1,1,1) # column of 1's
x1 <- c(1,2,3,4,5) # original x-values
 
# create the x- matrix of explanatory variables
 
x <- as.matrix(cbind(x0,x1))
 
# create the y-matrix of dependent variables
 
y <- as.matrix(c(3,7,5,11,14))
 
# estimate b = (X'X)^-1 X'y
 
b <- solve(t(x)%*%x)%*%t(x)%*%y
 
print(b) # this gives the intercept and slope - matching exactly
# the results above
Created by Pretty R at inside-R.org

To leave a comment for the author, please follow the link and comment on his blog: Econometric Sense.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Tags: , , , ,

Comments are closed.