Principal Components Regression in R: Part 2

May 24, 2016

(This article was first published on Revolutions, and kindly contributed to R-bloggers)

by John Mount Ph. D.
Data Scientist at Win-Vector LLC

In part 2 of her series on Principal Components Regression Dr. Nina Zumel illustrates so-called y-aware techniques. These often neglected methods use the fact that for predictive modeling problems we know the dependent variable, outcome or y, so we can use this during data preparation in addition to using it during modeling. Dr. Zumel shows the incorporation of y-aware preparation into Principal Components Analyses can capture more of the problem structure in fewer variables. Such methods include:

  • Effects based variable pruning
  • Significance based variable pruning
  • Effects based variable scaling.

This recovers more domain structure and leads to better models. Using the foundation set in the first article Dr. Zumel quickly shows how to move from a traditional x-only analysis that fails to preserve a domain-specific relation of two variables to outcome to a y-aware analysis that preserves the relation. Or in other words how to move away from a middling result where different values of y (rendered as three colors) are hopelessly intermingled when plotted against the first two found latent variables as shown below.


Dr. Zumel shows how to perform a decisive analysis where y is somewhat sortable by the each of the first two latent variable and the first two latent variables capture complementary effects, making them good mutual candidates for further modeling (as shown below).

NewImage Click here (part 2 y-aware methods) for the discussion, examples, and references. Part 1 (x only methods) can be found here.

To leave a comment for the author, please follow the link and comment on their blog: Revolutions. offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)