Create a Correlation Matrix in R

November 21, 2016
By

(This article was first published on (R)very Day, and kindly contributed to R-bloggers)

So, in my last post, I showed how to create two histograms from a certain data set and then how to plot the two variables to see if there is any relationship. Visually, it was easy to tell that there was a negative relationship between the weight of an automobile and the fuel economy of an automobile. But, is there a more objective way to understand the relationship? Is there a number we can assign to it?

Yes, it turns out there is. This number is called Pearson’s Correlation Coefficient or, in the vernacular, simply the “correlation.” Essentially, this number measures the percentage of fluctuation in one variable that can be explained by another variable. A correlation of 1 means the variables move in perfect unison, a correlation of -1 means the variables move in the complete opposite direction, and a correlation of 0 means there is no relationship at all between the two variables.

So, how to we retrieve the correlation between two variables in R? Let’s write some code…

motorcars <- read.csv("https://vincentarelbundock.github.io/Rdatasets/csv/datasets/mtcars.csv", stringsAsFactors = FALSE)

cor(motorcars$wt, motorcars$mpg)

plot(motorcars$wt, motorcars$mpg)

First, we import the same data set we used last time. When we view the data set (using colnames() or head()), we see that the column names for the variables we are trying to measure are “wt” and “mpg.” Now, all we need to do is subset these two variables with the dollar sign and place them within the cor() function.

mccor

When we run this code, we can see that the correlation is -0.87, which means that the weight and the mpg move in exactly opposite directions roughly 87% of the time. So, that’s it. You’ve run a correlation in R. If you plot the two variables using the plot() function, you can see that this relationship is fairly clear visually.

mtplot

But, wait? Could there be other things that are related to the fuel economy of the vehicle, besides weight? What else is in the data set? Let’s have a look. When we run the head() function on motorcars, we get the first 6 rows of every column in the data set.

headmc

What if we want to see how all of these variables are related to one another? Well, we could run a correlation on every single combination we can think of, but that would be tedious. Is there a way we can view all the correlations with a single line of code? Yes, there is.

mc_data <- motorcars[,2:length(motorcars)]
round(cor(mc_data),2)

First, we create a separate data frame that only includes the data from motorcars (subsets everything to the right of the vehicle model name). Then, we simply run a correlation on the new data frame, which we’ve called “mc_data.” To clean things up a bit, I’ve nested the cor() function within the round() function to round the result to two decimal places. When we enter this code, here’s what we get:

corrmat

We can see that there are several other variables that are related to mpg, such as cyl, disp, and hp. Now, we can plot the variables that are most correlated with miles per gallon using this code (refer to previous post for explanation).

par(mfrow = c(2,2))
plot(motorcars$wt, motorcars$mpg)
plot(motorcars$cyl, motorcars$mpg)
plot(motorcars$disp, motorcars$mpg)
plot(motorcars$hp, motorcars$mpg)

And, here’s what we get as a result…

mtplots

To leave a comment for the author, please follow the link and comment on their blog: (R)very Day.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)