#21 – Find significant relationships in data with a CoCo Matrix

[This article was first published on Darren Wilkinson » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Screen Shot 2013-08-19 at 08.47.17

The CoCo Matrix (correlation coefficient matrix) is a script for R that takes a table headed with multiple variables and calculates the correlation coefficients between each of the variables, determines which are statistically significant, and represents them visually in a grid-plot. I created the CoCo Matrix to cross correlate a table with a large number of variables to quickly assess where important correlations could be found.

Screen Shot 2013-08-19 at 08.47.27

Using the CoCo Matrix

The R file can be downloaded here or copied from the textbox at the end of this post.

  1. If you know the number of samples in your dataset (n) then degrees of freedom (df) = n-2. Use this table to find the R value above which significant values lie. In the code, at the top you should change the value of “p” as per the value you just looked up. If you don’t know the value for n then run the code once and type “n” into the console.
  2. If you want, customise the colours in the customisation area of the code
  3. Run the code. A dialogue box will request a file. Alternatively replace the code to direct to the file you want to use.
  4. Voila!

This is a very rough script I wrote, and I intend to make it a lot better at some point when I have the time. If you have any suggestions  for improvements then please comment below or get in touch with me.

# CoCo Matrix version 1.0
# Written by Darren J. Wilkinson
# wilkinsondarren.wordpress.com
# [email protected]
#
# The "CoCo Matrix" visualises the correlation coefficients for a given set of data.
# Like-Like correlations are given NA values (e.g. Height vs Height = NA). For the moment
# duplicates such as Height vs. Weight and Weight vs. Height remain. At some point I'll 
# provide an update that removes duplicates like that.
#
# Please feel free to edit the code, and if you make any improvements please let me know
# either on wilkinsondarren.wordpress.com or send me an email at [email protected]

# Packages -------
library (cwhmisc)
library (ggplot2)
library (grid)
library (scales)
# ----------------

# Plot Customisation ----------------------------------------------------------
# (for good colour suggestions visit colourlovers.com)
col.significant = "#556270"			# Colour used for significant correlations
col.notsignificant = "lightgrey"		# Colour used for non-significant correlations
col.na = "white"						# Colour used for NA values
e1 = c("nb", "ta", "ba", "rb", "hf", "zr", "yb", "y", "th", "u")   #  p) {s = "Significant"}
		if (temp < p) {s = "Not Significant"}
		if (temp == 1) {s = NA}
		if (temp == 1) {temp = NA}
		results[h,i] = temp
		plot.data[r,4] = s
		plot.data[r,3] = temp
		plot.data[r,2] = h
		plot.data[r,1] = i
	}

}

# Open new quartz window
dev.new (
	width = 12, 
	height = 9
	)

# Plot the matrix
ggplot (data = plot.data, aes (x = x, y = y)) + 

geom_point (aes (colour = sig), size = 20) + 

scale_x_continuous (labels = e1, name = "", breaks = c(1:n.e1)) +

scale_y_continuous (labels = e1, name = "", breaks = c(1:n.e1)) +

scale_colour_manual (values = c(col.notsignificant, col.significant, col.na)) +

labs (title = "CoCo Matrix v1.0")+

theme (
	plot.title = element_text (vjust = 3, size = 20, colour = "black"), #plot title
	plot.margin = unit (c(3, 3, 3, 3), "lines"), #adjust the margins of the entire plot
	plot.background = element_rect (fill = "white", colour = "black"),
	panel.border = element_rect (colour = "black", fill = F, size = 1), #change the colour of the axes to black
	panel.grid.major = element_blank (), # remove major grid
	panel.grid.minor = element_blank (),  # remove minor grid
	panel.background = element_rect (fill = "white"), #makes the background transparent (white) NEEDED FOR INSIDE TICKS
	legend.background = element_rect (colour = "black", size = 0.5, fill = "white"),
	legend.justification = c(0, 0),
	#legend.position = c(0, 0), # put the legend INSIDE the plot area
	legend.key = element_blank (), # switch off the rectangle around symbols in the legend
	legend.box.just = "bottom",
	legend.box = "horizontal",
	legend.title = element_blank (), # switch off the legend title
	legend.text = element_text (size = 15, colour = "black"), #sets the attributes of the legend text#
	axis.title.x = element_text (vjust = -2, size = 20, colour = "black"), #change the axis title
	axis.title.y = element_text (vjust = -0.1, angle = 90, size = 20, colour = "black"), #change the axis title
	axis.text.x = element_text (size = 17, vjust = -0.25, colour = "black"), #change the axis label font attributes
	axis.text.y = element_text (size = 17, hjust = 1, colour = "black"), #change the axis label font attributes#
	axis.ticks = element_line (colour = "black", size = 0.5), #sets the thickness and colour of axis ticks
	axis.ticks.length = unit(-0.25 , "cm"), #setting a negative length plots inside, but background must be FALSE colour
	axis.ticks.margin = unit(0.5, "cm") # the margin between the ticks and the text
	)

# Print data tables in the console
results
plot.data

To leave a comment for the author, please follow the link and comment on their blog: Darren Wilkinson » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)