Correlation and Correlogram Exercises

April 8, 2017
By

(This article was first published on R-exercises, and kindly contributed to R-bloggers)


Correlation analysis is one of the most popular techniques for data exploration. This set of exercises is intended to help you to extend, speed up, and validate your correlation analysis. It allows to practice in:
– calculating linear and nonlinear correlation coefficients,
– testing those coefficients for statistical significance,
– creating correlation matrices to study interdependence between variables in dataframes,
– drawing graphical representations of those matrices (correlograms),
– calculating coefficients for partial correlation between two variables (controlling for their correlation with other variables).
The exercises make use of functions from the packages Hmisc, corrgram, and ggm. Please install these packages, but do not load them before starting the exercises in which they are needed (to avoid a namespace conflict) (the ggm package contains the function called rcorr which masks the rcorr function from the Hmisc package, and vice versa. If you want to return to the rcorr function from the Hmisc after loading the ggm package run detach(package:ggm)).
Exercises are based on a reduced version of the auto dataset from the corrgram package (download here). The dataset contains characteristics of 1979 automobile models.
Answers to the exercises are available here

Exercise 1
Calculate simple (linear) correlation between car price and its fuel economy (measured in miles per gallon, or mpg).

Exercise 2
Use the cor.test function to check whether the obtained coefficient is statistically significant at 5% level.

Exercise 3
Simple correlation assumes a linear relationship between variables, but it may be useful to relax this assumption. Calculate Spearman’s correlation coefficient for the same variables, and find its statistical significance.

Exercise 4
In R, it is possible to calculate correlation for all pairs of numeric variables in a dataframe at once. However, this requires excluding non-numeric variables first.
Create a new dataframe, auto_num, that contains only columns with numeric values from the auto dataframe. You can do this using the Filter function.

Exercise 5
Use the cor function to create a matrix of correlation coefficients for variables in the auto_num dataframe.

Exercise 6
The standard cor.test function does not work with dataframes. However, statistical significance of correlation coefficients for a dataframe can be verified using the rcorr function from the Hmisc package.
Transform the auto_num dataframe into a matrix (auto_mat), and use it to check significance of the correlation coefficients with the rcorr function.

Exercise 7
Use the corrgram function from the corrgram package to create a default correlogram to visualize correlations between variables in the auto dataframe.

Exercise 8
Create another correlogram that (1) includes only the lower panel, (2) uses pie diagrams to represent correlation coefficients, and (3) orders the variables using the default order.

Exercise 9
Create a new dataframe, auto_subset, by subsetting the auto dataframe to include only the Price, MPG, Hroom, and Rseat variables. Use the new dataframe to create a correlogram that (1) shows correlation coefficients on the lower panel, and (2) shows scatter plots (points) on the upper panel.

Exercise 10
Use the the correlations function from the ggm package to create a correlation matrix with both full and partial correlation coefficients for the auto_subset dataframe. Find the partial correlation between car price and its fuel economy.

To leave a comment for the author, please follow the link and comment on their blog: R-exercises.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)