[This article was first published on R – Dataviz – Stats – Bayes, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

This is about some academic work I did that never got published. But, I think it should be out there as you might find it useful. Here’s the motivation. You measure three variables, A, B, and Z. You are interested in whether the correlation between A and Z is different to that between B and Z. Of course, they are not independent sampling distributions, so it’s a little complicated. Then, maybe you want to use a more robust correlation statistic: Spearman’s or Kendall’s or whatever. What now? I’m going to tell you about how this came to my attention and what we did. Then, you can read the manuscript, use the code files, gaze at the figures…

A few years ago, when I was teaching statistics and research methods at St George’s, University of London, or St George’s Medical School as most people call it, I was supervisor to Melissa Jacobs, a talented and dedicated occupational therapist working with people who had an arm amputated and were learning to use myoelectric prosthetic arms. These pick up small movements around your shoulder blade and the arm and hand flex and pinch accordingly. Crucially, the therapist needs to know how the patient is progressing from one week to the next, and what activities they find hard or easy. There is a gold-standard test for this called ACMC (Assessment of Capacity for Myoelectric Control), with a suitcase full of objects such as the patient will encounter in real life. Unfortunately, it is expensive to set up and maintain, and the training for the therapist is expensive and time-consuming. There is also a quick and easy test called Box And Blocks, which uses a box with a central barrier (like a backgammon set, if you like) and wooden blocks. Melissa devised a Modified Box And Blocks, which could also be made by anyone handy with a saw, and required minimal training, but that allowed more gradation of the challenge for the patient. That’s the whole subject of her Master’s dissertation, and you can read more there (you probably have to contact the SGUL library to get a temporary login, or they might email it to you).

So, the challenge was to compare the correlation of Box And Blocks vs ACMC with correlation of Modified Box And Blocks vs ACMC. It’s not the only consideration when validating a new outcome measure, but it’s the one I’m focussing on here. This is the problem of dependent correlations. As you’ll see in the paper, some tests already exist but usually using Pearson’s correlation coefficient only, assuming multivariate normality, etc. Clinical outcomes often exhibit floor / ceiling effects so I didn’t want to assume anything unnecessary. At the same time, my students were not raining to be statisticians, and needed something accessible and easy to use and describe. Randomisation tests came to mind. I wrote to the guru on this subject, Philip Good, who very kindly sent back a lot of advice and details of one approach he had taken.

Good’s test replaces values of Z with bootstrap resamples of Z (sampling with replacement). This breaks the correlation between A and Z, but also between B and Z. That gives you a way of generating many samples (and thus an empirical distribution function for test statistics) under a null hypothesis H0: Corr(A,Z)=Corr(B,Z)=0. It’s that zero that concerned me, so I thought I’d try a label-swapping approach, where pairs of A and B values swap places at random, which tests H0: Corr(A,Z)=Corr(B,Z). In practice, the empirical correlations are not identical and the randomised datasets have a common correlation somewhere between the two that are actually observed. There had been a small evaluation of this approach before, but only for Pearson’s, under normality and with one particular value of the correlation. Label-swapping is quite common in randomisation tests.

I ran a simulation of datasets with correlations from all compatible trios of {0.1, 0.3, 0.5, 0.7, 0.9} using Julia for this. It was the high-water mark of my Julia usage, although this year I am getting back into it. In a nutshell, the label-swapping test is the one to use. As I said in the paper, there are a number of reasons why randomisation tests can be helpful here:

robustness to skewness, floor or ceiling effects, applicability to correlation coefficients other than Pearson’s, and ease of understanding for students

Here’s a zip file with the manuscript (which got rejected), various Julia (v 0.4) and R (v 3.1.2) code files, image files for big (too big to print) multiple charts, and an Excel spreadsheet which does the label-swapping. You can plug your data into the spreadsheet and read the p-value off it. Our students were taught SPSS GUI at that point and we hadn’t brought R in as an option yet, so this is what Melissa used with the actual clinical data, and you are welcome to use it too for your purposes. All contents of that file are provided under terms and conditions of The Unlicense. If you have trouble using it, get in touch and I’ll try to help.