Tidbit: Correlation and Simple Linear Regression

October 19, 2012
By

(This article was first published on Kevin Davenport » R, and kindly contributed to R-bloggers)

In business “Correlation” is generically used as a mutual relationship or connection between two or more things; statistically speaking correlation is the interdependence of variable quantities. I overhear many end users request information on the correlation of variables for prediction use, what they are referring to is actually simple linear regression. I don’t mean to outline all the math used in either function, rather I’d like to differentiate the fundamental reasoning for the business user.

Whether you are examining the data in Excel via CORREL(), R via cor(), or MATLAB via corrcoef(x,y), correlation is best used when X and Y are two variables you can control and measure. Simple Linear Regression would be used if you control X and are measuring Y.  Time allowed to bake or grams of baking soda used are variables you might control (X) whereas height or density of the resulting cake might be the output variable (Y).

Similarities:

• the standardized regression coefficient is the same as Pearson’s correlation coefficient (opposed to Kendall and Spearman).
• The square of Pearson’s correlation coefficient is the same as the R²  in simple linear regression (R² provides information about the goodness of fit of a model. In regression it is a statistical measure of how well the regression line approximates the real data points. For example if R² was to equal 1.0 (max value), this would indicate that the regression line perfectly fits the data.
• Correlation and simple linear regression do not provide answers to causality directly.

Differences:

• The regression equation (y=α+βx) can be used to make predictions on Y based on values of X.
• Correlation usually refers to linear relationships, but it can refer to other forms of dependence such as polynomial or truly nonlinear relationships.

To leave a comment for the author, please follow the link and comment on their blog: Kevin Davenport » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...