# Basic Econometrics in R and SAS

[This article was first published on

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

**Econometric Sense**, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

**Regression Basics**

y= b0 + b1 *X ‘regression line we want to fit’

The method of least squares minimizes the squared distance between the line ‘y’ and

individual data observations yi. That is minimize: ∑ e

_{i}^{2}= ∑ (y_{i }– b_{0}– b_{1}X_{i})^{2}with respect to b_{0}and b_{1}.This can be accomplished by taking the partial derivatives of ∑ e

_{i}^{2}with respect to each coefficient and setting it equal to zero.∂ ∑ e

_{i}^{2 }/ ∂ b_{0}= 2 ∑ (y_{i }– b_{0}– b_{1}X_{i}) (-1) = 0∂ ∑ e

_{i}^{2 }/ ∂ b_{1}= 2 ∑(y_{i }– b_{0}– b_{1}X_{i}) (-X_{i}) = 0Solving for b

_{0}and b_{1}yields the ‘formulas’ for hand calculating the estimates:b

_{0}= y_{bar}– b_{1}X_{bar}b

_{1}= ∑ (( X_{i }– Xbar) (y_{i}– ybar)**)**/ ∑ ( X_{i }– Xbar) = [ ∑X_{i}Y_{i}– n xbar*ybar] / [∑X^{2}– n Xbar^{2}] = S( X,y) / SS(X)

**Example with Real Data:**

Given real data, we can use the formulas above to derive (by hand /caclulator/excel) the estimated values for b0 and b1, which give us the line of best fit, minimizing

**∑ e**_{i}^{2}= ∑ (y_{i }– b_{0}– b_{1}X_{i})^{2}.n= 5

∑X

_{i}Y_{i}= 146∑X

^{2 }= 55Xbar = 3

Ybar =8

b1 = [ ∑X

_{i}Y_{i}– n xbar*ybar] / [∑X^{2}– n Xbar^{2}] (146-5*3*8)/(55-5*3^{2}) = 26/10 =**2.6**b0= y

_{bar}– b_{1}X_{bar}= 8-2.6*3 =**.20****You can verify these results in PROC REG in SAS.**

/* GENEARATE DATA */

**DATA**REGDAT;

INPUT X Y;

CARDS;

1 3

2 7

3 5

4 11

5 14

;

**RUN**;

/* BASIC REGRESSION WITH PROC REG */

**PROC**

**REG**DATA = REGDAT;

MODEL Y = X;

**RUN**;

**Similarly this can be done in R using the ‘lm’ function:**

#------------------------------------------------------------ # regression with canned lm routine #------------------------------------------------------------ # read in data manually x <- c(1,2,3,4,5) # read in x -values y <- c(3,7,5,11,14) # read in y-values data1 <- data.frame(x,y) # create data set combining x and y values # analysis plot(data1$x, data1$y) # plot data reg1 <- lm(data1$y~data1$x) # compute regression estimates summary(reg1) # print regression output abline(reg1) # plot fitted regression line

**Regression Matrices**

Alternatively, this problem can be represented in matrix format.

We can then formulate the least squares equation as:

**y = Xb**

where the ‘errors’ or deviations from the fitted line can be formulated by the matrix :

**e =**(

**y – Xb)**

The matrix equivalent of ∑ e

_{i}^{2}becomes**(y - Xb)’ (y - Xb)**=**e’e****=**

**(y - Xb)’ (y - Xb) = y’y -**2

*** b’X’y + b’X’Xb**

Taking partials, setting = 0, and solving for

**b**gives:d

**e’e**/ d**b = -**2*** X’y +**2*** X’Xb =**02

**X’Xb =**2**X’y****X’Xb = X’y**

**b = (X’X)**

^{-1}

**X’y**which is the matrix equivalent to what we had before:

[ ∑X

_{i}Y_{i}– n xbar*ybar] / [∑X^{2}– n Xbar^{2}] = S( X,y) / SS(X)

**These computations can be carried out in SAS via PROC IML commands:**

/* MATRIX REGRESSION */

**PROC**

**IML**;

/* INPUT DATA AS VECTORS */

yt = {

**3****7****5****11****14**} ; /* TRANSPOSED Y VECTOR */x0t = j(

**1**,**5**,**1**); /* ROW VECTOR OF 1'S */x1t = {

**1****2****3****4****5**}; /* X VALUES */xt =x0t//x1t; /* COMBINE VECTORS INTO TRANSPOSED X-MATRIX */

PRINT yt x0t x1t;

/* FORMULATE REGRESSION MATRICES */

y= yt`; /* VECTOR OF DEPENDENT VARIABLES */

x =xt`; /* FULL X OR DESIGN MATRIX */

beta = inv(x`*x)*x`*y; /* THE CLASSICAL REGRESSION MATRIX */

PRINT beta;

TITLE 'REGRESSION MATRICES VIA PROC IML';

**QUIT**;

**RUN**;

OUTPUT

**The same results can be obtained in R as follows:**

#------------------------------------------------------------ # matrix programming based approach #------------------------------------------------------------ # regression matrices require a column of 1's in order to calculate # the intercept or constant, create this column of 1's as x0 x0 <- c(1,1,1,1,1) # column of 1's x1 <- c(1,2,3,4,5) # original x-values # create the x- matrix of explanatory variables x <- as.matrix(cbind(x0,x1)) # create the y-matrix of dependent variables y <- as.matrix(c(3,7,5,11,14)) # estimate b = (X'X)^-1 X'y b <- solve(t(x)%*%x)%*%t(x)%*%y print(b) # this gives the intercept and slope - matching exactly # the results above

To

**leave a comment**for the author, please follow the link and comment on their blog:**Econometric Sense**.R-bloggers.com offers

**daily e-mail updates**about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.