Multilevel Correlations: A New Method for Common Problems

[This article was first published on R on easystats, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

In this tutorial, we will introduce multilevel correlations (or hierarchical / random-effects correlations) and how to compute them using the new correlations package from the easystats suite.

You can install the updated version and load the package as follows:

install.packages("correlation")
library(correlation)

Data

Imagine we have an experiment in which 10 individuals completed a task with 100 trials. For each of the 1000 total trials, we measured two things, V1 and V2, and our research aims at investingating the link between these two variables.

We will generate data using the simulate_simpson() function from the correlation package installed above.

data <- simulate_simpson(n=100, groups=10)

Now let’s visualize the two variables:

library(ggplot2)

ggplot(data, aes(x=V1, y=V2)) + 
  geom_point() +
  geom_smooth(colour="black", method="lm", se=FALSE) +
  theme_classic()

That seems pretty straightfoward! It seems like there is a negative correlation between V1 and V2. Let’s test this.

Simple correlation

correlation(data)
## Parameter1 | Parameter2 |     r |         95% CI |      t |  df |      p |  Method | n_Obs
## ------------------------------------------------------------------------------------------
## V1         |         V2 | -0.84 | [-0.86, -0.82] | -48.77 | 998 | < .001 | Pearson |  1000

Indeed, there is strong, negative and significant correlation between V1 and V2. Great, can we go ahead and publish these results in PNAS?

The Simpson’s Paradox

Hold on sunshine! Ever heard of something called the Simpson’s Paradox?

Let’s colour our datapoints by group (by individuals):

library(ggplot2)

ggplot(data, aes(x=V1, y=V2)) + 
  geom_point(aes(colour=Group)) +
  geom_smooth(aes(colour=Group), method="lm", se=FALSE) + 
  geom_smooth(colour="black", method="lm", se=FALSE) + 
  theme_classic()

Mmh, interesting. It seems like, for each subject, the relationship is different. The negative general trend seems to be created by differences between the groups and could be spurious!

Multilevel (as in multi-group) correlations allow us to account for differences between groups. It is based on a partialization of the group variable, entered as a random factor in a mixed linear regression.

You can compute them with the correlations package by setting the multilevel arguent to TRUE.

correlation(data, multilevel = TRUE)
## Parameter1 | Parameter2 |    r |           CI |     t |  df |      p |  Method | n_Obs
## --------------------------------------------------------------------------------------
## V1         |         V2 | 0.50 | [0.45, 0.55] | 18.23 | 998 | < .001 | Pearson |  1000

Dayum! We were too hasty in our conclusions! Taking the group into account seems to be super important.

Note: In this simple case where only two variables are of interest, it would be of course best to directly proceed using a mixed regression model instead of correlations. That being said, the latter can be useful for exploratory analysis, when multiple variables are of interest, or in combination with a network or structural approach.

Get Involved

easystats is a new project in active development, looking for contributors and supporters. Thus, do not hesitate to contact us if you want to get involved 🙂

  • Check out our other blog posts here!

Stay tuned

To be updated about the upcoming features and cool R or data science stuff, you can follow the packages on GitHub (click on one of the easystats package) and then on the Watch button on the top right corner) as well as the easystats team on twitter and online:

To leave a comment for the author, please follow the link and comment on their blog: R on easystats.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)