Tidbit: Correlation and Simple Linear Regression

[This article was first published on Kevin Davenport » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

In business “Correlation” is generically used as a mutual relationship or connection between two or more things; statistically speaking correlation is the interdependence of variable quantities. I overhear many end users request information on the correlation of variables for prediction use, what they are referring to is actually simple linear regression. I don’t mean to outline all the math used in either function, rather I’d like to differentiate the fundamental reasoning for the business user.

Whether you are examining the data in Excel via CORREL(), R via cor(), or MATLAB via corrcoef(x,y), correlation is best used when X and Y are two variables you can control and measure. Simple Linear Regression would be used if you control X and are measuring Y.  Time allowed to bake or grams of baking soda used are variables you might control (X) whereas height or density of the resulting cake might be the output variable (Y).

Similarities:

  • the standardized regression coefficient is the same as Pearson’s correlation coefficient (opposed to Kendall and Spearman).
  • The square of Pearson’s correlation coefficient is the same as the R²  in simple linear regression (R² provides information about the goodness of fit of a model. In regression it is a statistical measure of how well the regression line approximates the real data points. For example if R² was to equal 1.0 (max value), this would indicate that the regression line perfectly fits the data.
  • Correlation and simple linear regression do not provide answers to causality directly.

Differences:

  • The regression equation (y=α+βx) can be used to make predictions on Y based on values of X.
  • Correlation usually refers to linear relationships, but it can refer to other forms of dependence such as polynomial or truly nonlinear relationships.

To leave a comment for the author, please follow the link and comment on their blog: Kevin Davenport » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)