**Serious Stats » R code**, and kindly contributed to R-bloggers)

In Chapter 6 (correlation and covariance) I consider how to construct a confidence interval (CI) for the difference between two independent correlations. The standard approach uses the Fisher *z* transformation to deal with boundary effects (the squashing of the distribution and increasing asymmetry as *r* approaches -1 or 1). As *z _{r}* is approximately normally distributed (which

*r*is decidedly not) you can create a standard error for the difference by summing the sampling variances according to the variance sum law (see chapter 3).

This works well for the CI around a single correlation (assuming the main assumptions – bivariate normality and homogeneity of variance – broadly hold) or for differences between means, but can perform badly when looking at the difference between two correlations. Zou (2007) proposed modification to the standard approach that uses the upper and lower bounds of the CIs for individual correlations to calculate a CI for their difference. He considered three cases: independent correlations and two types of dependent correlations (overlapping and non-overlapping). He also considered differences in *R*^{2} (not relevant here).

*Independent correlations*

In section 6.6.2 (*p*. 224) I illustrate Zou’s approach for independent correlations and provide R code in sections 6.7.5 and 6.7.6 to automate the calculations. Section 6.7.5 shows how to write a simple R function and illustrates it with a function to calculate a CI for Pearson’s *r* using the Fisher *z *transformation. Whilst writing the book I encountered several functions do do exactly this. The cor.test() function in the base package does this for raw data (along with computing the correlation and usual NHST). A number of functions compute it using the usual text book formula. My function relies on R primitive hyperbolic functions (as the Fisher *z* transformation is related to the geometry of hyperbolas), which may be useful if you need to use it intensively (e.g., for simulations):

The function is 6.7.6 uses the rz.ci() function to construct a CI for the difference between two independent correlations. See section 6.6.2 of %20">Serious stats or Zou (2007) for further details and a worked example. My function from section 6.7.6 is reproduced here:

r.ind.ci <- function(r1, r2, n1, n2=n1, conf.level = 0.95) { L1 <- rz.ci(r1, n1, conf.level = conf.level)[1] U1 <- rz.ci(r1, n1, conf.level = conf.level)[2] L2 <- rz.ci(r2, n2, conf.level = conf.level)[1] U2 <- rz.ci(r2, n2, conf.level = conf.level)[2] lower <- r1 - r2 - ((r1 - L1)^2 + (U2 - r2)^2)^0.5 upper <- r1 - r2 + ((U1 - r1)^2 + (r2 - L2)^2)^0.5 c(lower, upper) }

The call the function use the two correlation coefficients an sample as input (the default is to assume equal *n* and a 95% CI).

*A caveat*

As I point out in chapter 6, just because you can compare two correlation coefficients doesn’t mean it is a good idea. Correlations are standardized simple linear regression coefficients and even if the two regression coefficients measure the same effect, it doesn’t follow that their standardized counterparts do. This is not merely the problem that it may be meaningless to compare, say, a correlation between height and weight with a correlation between anxiety and neuroticism. Two correlations between the same variables in different samples might not be meaningfully comparable (e.g., because of differences in reliability, range restriction and so forth).

*Dependent overlapping correlations*

In many cases the correlations you want to compare aren’t independent. One reason for this is that the correlations share a common variable. For example if you correlate *X* with *Y* and *X* with *Z* you might be interested in whether the correlation *r _{XY}* is larger than

*r*. As

_{XZ}*X*is common to both variables the correlations are not independent. Zou (2007) describes how to adjust the interval to account for this correlation. In essence the sampling variances of the correlations are tweaked using a version of the variance sum law (again see chapter 3).

The following functions (not in the book) compute the correlation between the correlations and use it to adjust the CI for the difference in correlations to account for overlap (a shared predictor). Note that both functions and rz.ci() must be loaded into R. Also included is a calls to the main function that reproduces the output from example 2 of Zou (2007).

rho.rxy.rxz <- function(rxy, rxz, ryz) { num <- (ryz-1/2*rxy*rxz)*(1-rxy^2-rxz^2-ryz^2)+ryz^3 den <- (1 - rxy^2) * (1 - rxz^2) num/den } r.dol.ci <- function(r12, r13, r23, n, conf.level = 0.95) { L1 <- rz.ci(r12, n, conf.level = conf.level)[1] U1 <- rz.ci(r12, n, conf.level = conf.level)[2] L2 <- rz.ci(r13, n, conf.level = conf.level)[1] U2 <- rz.ci(r13, n, conf.level = conf.level)[2] rho.r12.r13 <- rho.rxy.rxz(r12, r13, r23) lower <- r12-r13-((r12-L1)^2+(U2-r13)^2-2*rho.r12.r13*(r12-L1)*(U2- r13))^0.5 upper <- r12-r13+((U1-r12)^2+(r13-L2)^2-2*rho.r12.r13*(U1-r12)*(r13-L2))^0.5 c(lower, upper) } # input from example 2 of Zou (2007, p.409) r.dol.ci(.396, .179, .088, 66)

The r.dol.ci() function takes three correlations as input – the correlations of interest (e.g., *r _{XY}* and

*r*) and the correlation between the non-overlapping variables (e.g.,

_{XZ}*r*). Also required is the sample size (often identical for both correlations).

_{YZ}*Dependent non-overlapping correlations*

Overlapping correlations are not the only cause of dependency between correlations. The samples themselves could be correlated. Zou (2007) gives the example of a correlation between two variables for a sample of mothers. The same correlation could be computed for their children. As the children and mothers have correlated scores on each variable, the correlation between the same two variables will be correlated (but not overlapping in the sense used earlier). The following functions compute the CI for the difference in correlations between dependent non-overlapping correlations. Also included is a call to the main function that reproduces Zou (2007) example 3.

rho.rab.rcd <- function(rab, rac, rad, rbc, rbd, rcd) { num <- 1/2*rab*rcd * (rac^2 + rad^2 + rbc^2 + rbd^2) + rac*rbd + rad*rbc - (rab*rac*rad + rab*rbc*rbd + rac*rbc*rcd + rad*rbd*rcd) den <- (1 - rab^2) * (1 - rcd^2) num/den } r.dnol.ci <- function(r12, r13, r14, r23, r24, r34, n12, n34=n12, conf.level=0.95) { L1 <- rz.ci(r12, n12, conf.level = conf.level)[1] U1 <- rz.ci(r12, n12, conf.level = conf.level)[2] L2 <- rz.ci(r34, n34, conf.level = conf.level)[1] U2 <- rz.ci(r34, n34, conf.level = conf.level)[2] rho.r12.r34 <- rho.rab.rcd(r12, r13, r14, r23, r24, r34) lower <- r12 - r34 - ((r12 - L1)^2 + (U2 - r34)^2 - 2 * rho.r12.r34 * (r12 - L1) * (U2 - r34))^0.5 upper <- r12 - r34 + ((U1 - r12)^2 + (r34 - L2)^2 - 2 * rho.r12.r34 * (U1 - r12) * (r34 - L2))^0.5 c(lower, upper) } # from example 3 of Zou (2007, p.409-10) r.dnol.ci(.396, .208, .143, .023, .423, .189, 66)

Although this call reproduces the final output for example 3 it produces slightly different intermediate results (0.0891 vs. 0.0917) for the correlation between correlations. Zou (personal communication) confirms that this is either a typo or rounding error (e.g., arising from hand calculation) in example 3 and that the function here produces accurate output. The input here requires the correlations from every possible correlation between the four variables being compared (and the relevant sample size for the correlations being compared). The easiest way to get the correlations is from a correlation matrix of the four variables.

*Robust alternatives*

Wilcox (2009) describes a robust alternative to these methods for independent correlations and modifications to Zou’s method that make the dependent correlation methods robust to violations of bivariate normality and (in particular) homogeneity of variance assumptions. Wilcox provides R functions for these approaches on his web pages. His functions take raw data as input and are computationally intensive. For instance the dependent correlation methods use Zou’s approach but take boostrap CIs for the individual correlations as input (rather than the simpler Fisher *z* transformed versions).

The relevant functions are twopcor() for the independent case, TWOpov() for the dependent overlapping case and TWOpNOV() for the non-overlapping case.

UPDATE

Zou’s modified asymptotic method is easy enough that you can run it in Excel. I’ve added an Excel spreadsheet to the blog resources that should implement the methods (and matches the output to R fairly closely). As it uses Excel it may not cope gracefully with some calculations (e.g., with extremely small or large values or *r *or other extreme cases) – and I have more confidence in the R code.

*References*

Baguley, T. (2012, in press). Serious stats: A guide to advanced statistics for the behavioral sciences. Basingstoke: Palgrave.

Zou, G. Y. (2007). Toward using confidence intervals to compare correlations. *Psychological Methods, 12,* 399-413.

Wilcox, R. R. (2009). Comparing Pearson correlations: Dealing with heteroscedascity and non-normality. *Communications in Statistics – Simulation & Computation, 38*, 2220-2234.

N.B. R code formatted via Pretty R at inside-R.org

Filed under: R code, serious stats, stats advice Tagged: behavioral sciences, boundary effects, confidence intervals, correlation and covariance, R, robust statistics, statistics

**leave a comment**for the author, please follow the link and comment on their blog:

**Serious Stats » R code**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...