359 search results for "PCA"

In search of an incredible posterior

June 22, 2016
By
In search of an incredible posterior

What is credibility? For over one hundred years 1 actuaries have been wresting with the idea of “credibility”. This is the process whereby one may make a quantitative assessment of the predictive power of sample data. Where necessary, the researcher augments the sample with some exogeneous information - usually more data - to arrive at a final conclusion. In...

Read more »

y-aware scaling in context

June 22, 2016
By

Nina Zumel introduced y-aware scaling in her recent article Principal Components Regression, Pt. 2: Y-Aware Methods. I really encourage you to read the article and add the technique to your repertoire. The method combines well with other methods and can drive better predictive modeling results. From feedback I am not sure everybody noticed that in … Continue reading...

Read more »

Risk Models with Generalized PLS

June 12, 2016
By
Risk Models with Generalized PLS

While developing risk models with hundreds of potential variables, we often run into the situation that risk characteristics or macro-economic indicators are highly correlated, namely multicollinearity. In such cases, we might have to drop variables with high VIFs or employ “variable shrinkage” methods, e.g. lasso or ridge, to suppress variables with colinearity. Feature extraction approaches

Read more »

Why you should read Nina Zumel’s 3 part series on principal components analysis and regression

June 9, 2016
By
Why you should read Nina Zumel’s 3 part series on principal components analysis and regression

Short form: Win-Vector LLC’s Dr. Nina Zumel has a three part series on Principal Components Regression that we think is well worth your time. Part 1: the proper preparation of data (including scaling) and use of principal components analysis (particularly for supervised learning or regression). Part 2: the introduction of y-aware scaling to direct the … Continue reading...

Read more »

What are the Best Machine Learning Packages in R?

June 6, 2016
By
Image 13

Guest post by Khushbu Shah The most common question asked by prospective data scientists is – “What is the best programming language for Machine Learning?” The answer to this question always results in a debate whether to choose R, Python or MATLAB for Machine Learning. Nobody can, in reality, answer the question as to whether Python or R is best...

Read more »

Building the Data Matrix for the Task at Hand and Analyzing Jointly the Resulting Rows and Columns

June 5, 2016
By
Building the Data Matrix for the Task at Hand and Analyzing Jointly the Resulting Rows and Columns

Someone decided what data ought to go into the matrix. They placed the objects of interest in the rows and the features that differentiate among those objects into the columns. Decisions were made either to collect information or to store what was gathered for other purposes (e.g., data mining).A set of mutually constraining choices...

Read more »

Principal Components Regression, Pt. 3: Picking the Number of Components

May 30, 2016
By
Principal Components Regression, Pt. 3: Picking the Number of Components

In our previous note we demonstrated Y-Aware PCA and other y-aware approaches to dimensionality reduction in a predictive modeling context, specifically Principal Components Regression (PCR). For our examples, we selected the appropriate number of principal components by eye. In this note, we will look at ways to select the appropriate number of principal components in … Continue reading...

Read more »

Principal Components Regression, Pt. 2: Y-Aware Methods

May 23, 2016
By
Principal Components Regression, Pt. 2: Y-Aware Methods

In our previous note, we discussed some problems that can arise when using standard principal components analysis (specifically, principal components regression) to model the relationship between independent (x) and dependent (y) variables. In this note, we present some dimensionality reduction techniques that alleviate some of those problems, in particular what we call Y-Aware Principal Components … Continue reading...

Read more »

Tutorial: GitHub for Data Scientists without the Terminal

May 21, 2016
By

Git and GitHub are indispensable tools for anyone analysing data, developing software or disseminating results. Originally designed for software engineers, GitHub is now widely used in many disciplines, especially for researchers in academia. Having a source code management software such as GitHub to host your code and have detailed project documentation is a huge step

Read more »

Installing WVPlots and “knitting R markdown”

May 20, 2016
By
Installing WVPlots and “knitting R markdown”

Some readers have been having a bit of trouble using devtools to install WVPlots. I thought I would write a note with a few instructions to help. These are things you should not have to do often, and things those of us already running R have stumbled through and forgotten about. First you will need … Continue reading...

Read more »

Sponsors

Mango solutions



plotly webpage

dominolab webpage



Zero Inflated Models and Generalized Linear Mixed Models with R

Quantide: statistical consulting and training

datasociety

http://www.eoda.de





ODSC

ODSC

CRC R books series





Six Sigma Online Training









Contact us if you wish to help support R-bloggers, and place your banner here.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)