Basic Econometrics in R and SAS

Posted on November 27, 2011 by Matt Bogard in R bloggers | 0 Comments

[This article was first published on Econometric Sense, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Regression Basics

y= b0 + b1 *X ‘regression line we want to fit’

The method of least squares minimizes the squared distance between the line ‘y’ and

individual data observations yi.

That is minimize: ∑ e_i² = ∑ (y_i– b₀ – b₁ X_i )² with respect to b₀ and b₁ .

This can be accomplished by taking the partial derivatives of ∑ e_i² with respect to each coefficient and setting it equal to zero.

∂ ∑ e_i²/ ∂ b₀ = 2 ∑ (y_i– b₀ – b₁ X_i ) (-1) = 0

∂ ∑ e_i²/ ∂ b₁ = 2 ∑(y_i– b₀ – b₁ X_i ) (-X_i) = 0

Solving for b₀ and b₁ yields the ‘formulas’ for hand calculating the estimates:

b₀ = y_bar – b₁ X_bar

b₁ = ∑ (( X_i– Xbar) (y_i – ybar)) / ∑ ( X_i– Xbar) = [ ∑X_i Y_i – n xbar*ybar] / [∑X² – n Xbar²]

= S( X,y) / SS(X)

Example with Real Data:

Given real data, we can use the formulas above to derive (by hand /caclulator/excel) the estimated values for b0 and b1, which give us the line of best fit, minimizing ∑ e_i² = ∑ (y_i– b₀ – b₁ X_i )² .

n= 5

∑X_i Y_i = 146

∑X² = 55

Xbar = 3

Ybar =8

b1 = [ ∑X_i Y_i – n xbar*ybar] / [∑X² – n Xbar²] (146-5*3*8)/(55-5*3²) = 26/10 = 2.6

b0= y_bar – b₁ X_bar = 8-2.6*3 = .20

You can verify these results in PROC REG in SAS.

/* GENEARATE DATA */

DATA REGDAT;

INPUT X Y;

CARDS;

1 3

2 7

3 5

4 11

5 14

;

RUN;

/* BASIC REGRESSION WITH PROC REG */

PROC REG DATA = REGDAT;

MODEL Y = X;

RUN;

QUIT;

OUTPUT:

Similarly this can be done in R using the ‘lm’ function:

#------------------------------------------------------------
#  regression with canned lm routine
#------------------------------------------------------------
 
# read in data manually
 
x <- c(1,2,3,4,5) # read in x -values
 
y <- c(3,7,5,11,14) # read in y-values
 
data1 <- data.frame(x,y) # create data set combining x and y values
 
# analysis
 
plot(data1$x, data1$y) # plot data 
reg1 <- lm(data1$y~data1$x) # compute regression estimates
summary(reg1)              # print regression output
abline(reg1)               # plot fitted regression line

Regression Matrices

Alternatively, this problem can be represented in matrix format.

We can then formulate the least squares equation as:

y = Xb

where the ‘errors’ or deviations from the fitted line can be formulated by the matrix :

e = (y – Xb)

The matrix equivalent of ∑ e_i² becomes (y - Xb)’ (y - Xb) = e’e

= (y - Xb)’ (y - Xb) = y’y - 2 * b’X’y + b’X’Xb

Taking partials, setting = 0, and solving for b gives:

d e’e / d b = -2 * X’y +2* X’Xb = 0

2 X’Xb = 2 X’y

X’Xb = X’y

b = (X’X)^-1 X’y which is the matrix equivalent to what we had before:

[ ∑X_i Y_i – n xbar*ybar] / [∑X² – n Xbar²] = S( X,y) / SS(X)

These computations can be carried out in SAS via PROC IML commands:

/* MATRIX REGRESSION */

PROC IML;

/* INPUT DATA AS VECTORS */

yt = {3 7 5 11 14} ; /* TRANSPOSED Y VECTOR */

x0t = j(1,5,1); /* ROW VECTOR OF 1'S */

x1t = {1 2 3 4 5}; /* X VALUES */

xt =x0t//x1t; /* COMBINE VECTORS INTO TRANSPOSED X-MATRIX */

PRINT yt x0t x1t;

/* FORMULATE REGRESSION MATRICES */

y= yt`; /* VECTOR OF DEPENDENT VARIABLES */

x =xt`; /* FULL X OR DESIGN MATRIX */

beta = inv(x`*x)*x`*y; /* THE CLASSICAL REGRESSION MATRIX */

PRINT beta;

TITLE 'REGRESSION MATRICES VIA PROC IML';

QUIT;

RUN;

OUTPUT

The same results can be obtained in R as follows:

#------------------------------------------------------------
#   matrix programming based approach
#------------------------------------------------------------
 
# regression matrices require a column of 1's in order to calculate 
# the intercept or constant, create this column of 1's as x0
 
x0 <- c(1,1,1,1,1) # column of 1's
x1 <- c(1,2,3,4,5) # original x-values
 
# create the x- matrix of explanatory variables
 
x <- as.matrix(cbind(x0,x1))
 
# create the y-matrix of dependent variables
 
y <- as.matrix(c(3,7,5,11,14))
 
# estimate  b = (X'X)^-1 X'y
 
b <- solve(t(x)%*%x)%*%t(x)%*%y
 
print(b) # this gives the intercept and slope - matching exactly 
         # the results above

To leave a comment for the author, please follow the link and comment on their blog: Econometric Sense.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

R-bloggers

R news and tutorials contributed by hundreds of R bloggers

Basic Econometrics in R and SAS

Related

Related

Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts. (You will not see this message again.)

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)