# Basic Econometrics in R and SAS

November 27, 2011
By

(This article was first published on Econometric Sense, and kindly contributed to R-bloggers)

Regression Basics
y= b0 + b1 *X  ‘regression line we want to fit’
The method of least squares minimizes the squared distance between the line ‘y’ and

individual data observations yi

That is minimize: ∑ ei2 = ∑ (yi – b0 –  b1 Xi )2   with respect to b0 and  b1 .
This can be accomplished by taking the partial derivatives of  ∑ ei2 with respect to each coefficient and setting it equal to zero.
∂ ∑ ei2 / ∂ b0 =  2 ∑ (yi – b0 –  b1 Xi )  (-1) = 0
∂ ∑ ei2 / ∂ b1 =   2 ∑(yi – b0 –  b1 Xi )  (-Xi) = 0
Solving for b0 and  b1 yields the ‘formulas’ for hand calculating the estimates:
b0 = ybar – b1 Xbar
b1 = ∑ (( Xi – Xbar)  (yi – ybar)) / ∑ ( Xi – Xbar) =  [ ∑Xi Yi  – n xbar*ybar] / [∑X2 – n Xbar2
=   S( X,y) / SS(X)

Example with Real Data:
Given real data, we can use the formulas above to derive (by hand /caclulator/excel) the estimated values for b0 and b1, which give us the line of best fit, minimizing  ∑ ei2 = ∑ (yi – b0 –  b1 Xi )2  .
n= 5
∑Xi Yi   = 146
∑X2  = 55
Xbar = 3
Ybar =8
b1 =  [ ∑Xi Yi  – n xbar*ybar] / [∑X2 – n Xbar2]    (146-5*3*8)/(55-5*32) = 26/10 = 2.6
b0= ybar – b1 Xbar  = 8-2.6*3 = .20
You can verify these results in PROC REG in SAS.
/* GENEARATE DATA */
DATA REGDAT;
INPUT X Y;
CARDS;
1 3
2 7
3 5
4 11
5 14
;
RUN;
/* BASIC REGRESSION WITH PROC REG */
PROC REG DATA = REGDAT;
MODEL Y = X;
RUN;
Similarly this can be done in R using the ‘lm’ function:
`#------------------------------------------------------------#  regression with canned lm routine#------------------------------------------------------------ # read in data manually x <- c(1,2,3,4,5) # read in x -values y <- c(3,7,5,11,14) # read in y-values data1 <- data.frame(x,y) # create data set combining x and y values # analysis plot(data1\$x, data1\$y) # plot data reg1 <- lm(data1\$y~data1\$x) # compute regression estimatessummary(reg1)              # print regression outputabline(reg1)               # plot fitted regression line`

Regression Matrices
Alternatively, this problem can be represented in matrix format.
We can then formulate the least squares equation as:
y = Xb

where the ‘errors’  or deviations from the fitted line can be formulated by the matrix :
e = (y – Xb)
The matrix equivalent of ∑ ei2  becomes (y – Xb)’ (y – Xb) = e’e
= (y – Xb)’ (y – Xb) = y’y – 2 * b’X’y + b’X’Xb
Taking partials, setting = 0, and solving for  b   gives:
d e’e / d b = –2 * X’y +2* X’Xb = 0
2 X’Xb =   2 X’y
X’Xb = X’y
b = (X’X)-1  X’y   which is the matrix equivalent to what we had before:
[ ∑Xi Yi  – n xbar*ybar] / [∑X2 – n Xbar2]  =   S( X,y) / SS(X)

These computations can be carried out in SAS via PROC IML commands:

/* MATRIX REGRESSION */
PROC IML;
/* INPUT DATA AS VECTORS */
yt = {3 7 5 11 14} ; /* TRANSPOSED Y VECTOR */
x0t = j(1,5,1); /* ROW VECTOR OF 1’S */
x1t = {1 2 3 4 5}; /* X VALUES */
xt =x0t//x1t; /* COMBINE VECTORS INTO TRANSPOSED X-MATRIX */
PRINT yt x0t x1t;
/* FORMULATE REGRESSION MATRICES */
y= yt`;     /* VECTOR OF DEPENDENT VARIABLES */
x =xt`; /* FULL X OR DESIGN MATRIX */
beta = inv(x`*x)*x`*y;  /* THE CLASSICAL REGRESSION MATRIX */
PRINT beta;
TITLE ‘REGRESSION MATRICES VIA PROC IML’;
QUIT;
RUN;

OUTPUT

The same results can be obtained in R as follows:

`#------------------------------------------------------------#   matrix programming based approach#------------------------------------------------------------ # regression matrices require a column of 1's in order to calculate # the intercept or constant, create this column of 1's as x0 x0 <- c(1,1,1,1,1) # column of 1'sx1 <- c(1,2,3,4,5) # original x-values # create the x- matrix of explanatory variables x <- as.matrix(cbind(x0,x1)) # create the y-matrix of dependent variables y <- as.matrix(c(3,7,5,11,14)) # estimate  b = (X'X)^-1 X'y b <- solve(t(x)%*%x)%*%t(x)%*%y print(b) # this gives the intercept and slope - matching exactly          # the results above`

To leave a comment for the author, please follow the link and comment on their blog: Econometric Sense.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Tags: , , , ,

Comments are closed.

# Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts.(You will not see this message again.)

Click here to close (This popup will not appear again)