# Create a publication-ready correlation matrix, with significance levels, in R

**r – paulvanderlaken.com**, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

In most (observational) research papers you read, you will probably run into a **correlation matrix**. Often it looks something like this:

In Social Sciences, like Psychology, researchers like to denote the **statistical significance levels** of the correlation coefficients, often using asterisks (i.e., *). Then the table will look more like this:

Regardless of my personal preferences and opinions, I had to make many of these tables for the scientific (non-)publications of my Ph.D..

I remember that, when I first started using R, I found it quite difficult to generate these correlation matrices automatically.

**Yes**, there is the `cor`

function, but it does not include significance levels.

Then there the (in)famous `Hmisc`

package, with its `rcorr`

function. But this tool provides a **whole new range of issues**.

What’s this `storage.mode`

, and what are we trying to coerce again?

Soon you figure out that `Hmisc::rcorr`

only takes in matrices *(thus with only numeric values)*. **Hurray**, now you can run a correlation analysis on your *dataframe*, you think…

Yet, the output is **all but publication-ready**!

You wanted one correlation matrix, but now you have two… **Double the trouble?**

To **spare future scholars the struggle** of the early day R programming, I would like to share my *custom function* `correlation_matrix`

.

My `correlation_matrix`

takes in a *dataframe*, selects only the numeric (and boolean/logical) columns, calculates the correlation coefficients and p-values, and outputs a **fully formatted publication-ready correlation matrix**!

You can specify **many formatting options** in `correlation_matrix`

.

For instance, you can use only 2 decimals. You can focus on the lower triangle *(as the lower and upper triangle values are identical)*. And you can drop the diagonal values:

Or maybe you are interested in a **different type of correlation coefficients**, and not so much in significance levels:

For other formatting options, do have a look at the **source code below**.

Now, to make matters **even more easy**, I wrote a second function (`save_correlation_matrix`

) to directly save any created correlation matrices:

Once you open your new correlation matrix file in Excel, it is **immediately ready** to be copy-pasted into Word!

If you are looking for ways to **visualize **your correlations do have a look at the packages `corrr`

and `corrplot`

.

**I hope my functions are of help to you!**

Do reach out if you get to use them in any of your research papers!

I would be super interested and feel honored.

`correlation_matrix`

#' correlation_matrix #' Creates a publication-ready / formatted correlation matrix, using `Hmisc::rcorr` in the backend. #' #' @param df dataframe; containing numeric and/or logical columns to calculate correlations for #' @param type character; specifies the type of correlations to compute; gets passed to `Hmisc::rcorr`; options are `"pearson"` or `"spearman"`; defaults to `"pearson"` #' @param digits integer/double; number of decimals to show in the correlation matrix; gets passed to `formatC`; defaults to `3` #' @param decimal.mark character; which decimal.mark to use; gets passed to `formatC`; defaults to `.` #' @param use character; which part of the correlation matrix to display; options are `"all"`, `"upper"`, `"lower"`; defaults to `"all"` #' @param show_significance boolean; whether to add `*` to represent the significance levels for the correlations; defaults to `TRUE` #' @param replace_diagonal boolean; whether to replace the correlations on the diagonal; defaults to `FALSE` #' @param replacement character; what to replace the diagonal and/or upper/lower triangles with; defaults to `""` (empty string) #' #' @return a correlation matrix #' @export #' #' @examples #' `correlation_matrix(iris)` #' `correlation_matrix(mtcars)` correlation_matrix <- function(df, type = "pearson", digits = 3, decimal.mark = ".", use = "all", show_significance = TRUE, replace_diagonal = FALSE, replacement = ""){ # check arguments stopifnot({ is.numeric(digits) digits >= 0 use %in% c("all", "upper", "lower") is.logical(replace_diagonal) is.logical(show_significance) is.character(replacement) }) # we need the Hmisc package for this require(Hmisc) # retain only numeric and boolean columns isNumericOrBoolean = vapply(df, function(x) is.numeric(x) | is.logical(x), logical(1)) if (sum(!isNumericOrBoolean) > 0) { cat('Dropping non-numeric/-boolean column(s):', paste(names(isNumericOrBoolean)[!isNumericOrBoolean], collapse = ', '), '\n\n') } df = df[isNumericOrBoolean] # transform input data frame to matrix x <- as.matrix(df) # run correlation analysis using Hmisc package correlation_matrix <- Hmisc::rcorr(x, type = ) R <- correlation_matrix$r # Matrix of correlation coeficients p <- correlation_matrix$P # Matrix of p-value # transform correlations to specific character format Rformatted = formatC(R, format = 'f', digits = digits, decimal.mark = decimal.mark) # if there are any negative numbers, we want to put a space before the positives to align all if (sum(R < 0) > 0) { Rformatted = ifelse(R > 0, paste0(' ', Rformatted), Rformatted) } # add significance levels if desired if (show_significance) { # define notions for significance levels; spacing is important. stars <- ifelse(is.na(p), " ", ifelse(p < .001, "***", ifelse(p < .01, "** ", ifelse(p < .05, "* ", " ")))) Rformatted = paste0(Rformatted, stars) } # build a new matrix that includes the formatted correlations and their significance stars Rnew <- matrix(Rformatted, ncol = ncol(x)) rownames(Rnew) <- colnames(x) colnames(Rnew) <- paste(colnames(x), "", sep =" ") # replace undesired values if (use == 'upper') { Rnew[lower.tri(Rnew, diag = replace_diagonal)] <- replacement } else if (use == 'lower') { Rnew[upper.tri(Rnew, diag = replace_diagonal)] <- replacement } else if (replace_diagonal) { diag(Rnew) <- replacement } return(Rnew) }

`save_correlation_matrix`

#' save_correlation_matrix #' Creates and save to file a fully formatted correlation matrix, using `correlation_matrix` and `Hmisc::rcorr` in the backend #' @param df dataframe; passed to `correlation_matrix` #' @param filename either a character string naming a file or a connection open for writing. "" indicates output to the console; passed to `write.csv` #' @param ... any other arguments passed to `correlation_matrix` #' #' @return NULL #' #' @examples #' `save_correlation_matrix(df = iris, filename = 'iris-correlation-matrix.csv')` #' `save_correlation_matrix(df = mtcars, filename = 'mtcars-correlation-matrix.csv', digits = 3, use = 'lower')` save_correlation_matrix = function(df, filename, ...) { write.csv2(correlation_matrix(df, ...), file = filename) }

**Sign up to keep up to date on the latest R, Data Science & Tech content:**

**leave a comment**for the author, please follow the link and comment on their blog:

**r – paulvanderlaken.com**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.