Principal Components Regression in R: Part 2

[This article was first published on Revolutions, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

by John Mount Ph. D.
Data Scientist at Win-Vector LLC

In part 2 of her series on Principal Components Regression Dr. Nina Zumel illustrates so-called y-aware techniques. These often neglected methods use the fact that for predictive modeling problems we know the dependent variable, outcome or y, so we can use this during data preparation in addition to using it during modeling. Dr. Zumel shows the incorporation of y-aware preparation into Principal Components Analyses can capture more of the problem structure in fewer variables. Such methods include:

  • Effects based variable pruning
  • Significance based variable pruning
  • Effects based variable scaling.

This recovers more domain structure and leads to better models. Using the foundation set in the first article Dr. Zumel quickly shows how to move from a traditional x-only analysis that fails to preserve a domain-specific relation of two variables to outcome to a y-aware analysis that preserves the relation. Or in other words how to move away from a middling result where different values of y (rendered as three colors) are hopelessly intermingled when plotted against the first two found latent variables as shown below.


Dr. Zumel shows how to perform a decisive analysis where y is somewhat sortable by the each of the first two latent variable and the first two latent variables capture complementary effects, making them good mutual candidates for further modeling (as shown below).

NewImage Click here (part 2 y-aware methods) for the discussion, examples, and references. Part 1 (x only methods) can be found here.

To leave a comment for the author, please follow the link and comment on their blog: Revolutions. offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)