We have had some trouble with some articles being damaged or hard to access in the Win Vector blog. I (John Mount) do want to apologize for that.
In particular the graphs are missing for Dr. Nina Zumel’s wonderful y-aware Pricipal Components regression series. The complete
.Rmd files that generated the articles are easy to get to, and fix this problem. So I am posting the links here.
I see these articles not as so much anything that will make us popular, but as eliminating some documentation debt to our clients and partners who have benefited from the methods. Also it is chance for me to properly give credit to Dr. Nina Zumel, being the noisier Win Vector partner I tend to be credited with a lot of things that turn out to have been her work.
Y-aware PCA is a technique we have used with great success for a lot of clients. The nearest methods would be L2-regularized regression and PCA/PCR methods. Y-aware PCA differs a lot in how noise variables are treated than L2-regularized regression, being much more resistant to over-fit. Y-aware PCA differs a lot from standard scaling PCA/PCR, being aggressive in filtering out irrelevant variables.
Without further delay, here is the articles series with figures intact!
- Principal Components Regression, Pt.1: The Standard Method
- Principal Components Regression, Pt. 2: Y-Aware Methods
- Principal Components Regression, Pt. 3: Picking the Number of Components
- Principal Components Regression, Pt. 4: Y-Aware Methods for Classification
I also had a bit of an advertisement for the first 3 parts of the series here: Why you should read Nina Zumel’s 3 part series on principal components analysis and regression; be aware it links to the damaged articles and not to the undamaged articles mentioned above.
In addition to the above demonstrations of the effectiveness of the method we have some proofs of effectiveness in specialized situations, but we haven’t written those up in a shareable form yet.