More on Orthogonal Regression

[This article was first published on Econometrics Beat: Dave Giles' Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Some time ago I wrote a post about orthogonal regression. This is where we fit a regression line so that we minimize the sum of the squares of the orthogonal (rather than vertical) distances from the data points to the regression line.

Subsequently, I received the following email comment:
“Thanks for this blog post. I enjoyed reading it. I’m wondering how straightforward you think this would be to extend orthogonal regression to the case of two independent variables? Assume both independent variables are meaningfully measured in the same units.”
Well, we don’t have to make the latter assumption about units in order to answer this question. And we don’t have to limit ourselves to just two regressors. Let’s suppose that we have p of them.

In fact, I hint at the answer to the question posed above towards the end of my earlier post, when I say,Finally, it will come as no surprise to hear that there’s a close connection between orthogonal least squares and principal components analysis.”

What was I referring to, exactly?
Well, just recall how we define the Principal Components of a multivariate set of data. Suppose that the data are in the form of an (n x p) matrix, X. There are n observations, and p variables. An orthogonal transformation is applied to X. This results in r (le p) new variables that are linearly uncorrelated.  These are the principal components (PC’s) of the data, and they are ordered as follow. The first PC accounts for the most of the variability in the original data. The second PC accounts for the maximum amount of the remaining variability in the data, subject to the constraint that it is uncorrelated with (i.e., orthogonal to) the first PC. 

Note how orthogonality has crept into the story!

We then continue – the third PC accounts for the maximum amount of the remaining variability in the data, subject to the constraint that it is orthogonal to both the first and second PC’s. etc.

You’ll find examples of PC analysis being used in a statistically descriptive way in some earlier posts of mine – e.g., here and here.

We can use (some of) the PC’s of the regressor data as explanatory variables in a regression model. A useful reference for this can be found here. Note that, by construction, these transformed explanatory variables will have zero multicollinearity.

So, in the multivariate case, orthogonal regression is just least squares regression using a sub-set of the principal components of the original regressor matrix as the explanatory variables. We also sometimes call it Total Least Squares.

In this earlier post I talked about using Principal Components Regression (PCR) in the context of simultaneous equations models. The problem there was that we can’t construct the 2SLS estimator if the sample size is smaller than the total number of predetermined variables in the entire system. (This used to be referred to as the “under-sized sample” problem.) One solution was to use a few of the principal components of the matrix of data on the predetermined variables, instead of all of the latter variables, at the first stage of 2SLS. (Usually, just the first few principal components will capture almost all of the variability in the original data.)

There are some useful discussions of this that you might want to refer to. For instance, Vincent Zoonekynd has a nice illustration here. I particularly recommend two other pieces that discuss PCR using R – this post, Principal components regression in R, an operational tutorial”, by John Mount, on the Revolutions blog; and this post, “Performing principal components regression (PCR) in R”, by Michy Alice, on the Quantide site.

PCR also gets a brief mention in this earlier post of mine – see the discussion of the last paper mentioned in that post.

So, the bottom line is that while my introductory post dealt with just the single-regressor case, it’s straightforward to apply orthogonal multiple regression – it’s just regression using the first few principal components of the  regressor matrix.

© 2016, David E. Giles

To leave a comment for the author, please follow the link and comment on their blog: Econometrics Beat: Dave Giles' Blog. offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)