Showing a different approach to making statistical tests

Posted on September 5, 2016 by Florian Privé in R bloggers | 0 Comments

[This article was first published on Florian Privé, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

In this post, I will talk about an alternative way to choose quantiles (and more broadly, decision boundaries) for statistical tests, the ones you choose in order to have a 95% confidence interval (5% of type-I error). I will then show that this idea can be used to combine tests. I will use some illustrations in R to make this clearer.

I got this idea during my internship last year on comparing many goodness-of-fit test statistics for the Weibull distribution in the case of type-II right censoring.

Example with the chi-squared distribution

Say that you have a test whose values under the null hypothesis (\(H_0\)) follow a chi-squared distribution with 10 degrees of freedom (\(\chi_{10}^2\)).

You can choose either

to reject \(H_0\) (with significance \(\alpha = 5\%\)) only for the largest values of the test statistic, which means rejecting the null hypothesis for values that are larger than the 95-percentile:

One-tailed test

or to reject \(H_0\) for both largest and smallest values of the statistic. Indeed, smallest values could be considered “too good to be true” (Stuart 1954). Then, \(H_0\) is rejected for values smaller than the 2.5-percentile or larger than the 97.5-percentile:

Two-tailed test

Why choosing? Why not letting the test choose by itself?

What do I mean by this? If you make the decision boundary on the test statistic’s density’s values (y-axis), not directly on the statistic’s values (x-axis), you always obtain a one-tailed test whatever is the distribution of the test statistic. Then, you reject \(H_0\) for all values that have a corresponding density lower than the 5-percentile:

Always a one-tailed test, but with respect to the y-axis

I see this as rejecting the 5% less probable values (“probable” in terms of the test statistic’s density). This may be a way to get unbiased tests.

Application to the combination of tests

Combining tests may be a way to create more powerful or robust tests.

First example: application in reliability

Say that you have two goodness-of-fit test statistics for the Weibull distribution (GOFWs) (a well-known distribution in survival analysis). How to combine them?

A priori, the best way I can see is to use their joint distribution. A 2D distribution has a density, as before, so we can find a threshold so that only 5% of this distribution’s values have a corresponding density under this threshold. This threshold is also called a 95%-contour.

Again, an image will be clearer than words. I drew several samples of size 50 from the Weibull distribution and three alternatives to the Weibull distribution: the Gamma, Log-Normal and Dhillon I distributions. For all these samples, I computed the corresponding values of the two GOFWs, and I plotted these paired values:

So, in black are the paired values for several samples of the Weibull distribution (the null hypothesis) and the alternatives are spread around. We have also in black the 95%-contour for \(H_0\). So, points outside of this boundary correspond to samples for which we reject the null hypothesis \(H_0\).

This gave one of the most powerful tests for the Weibull distribution.

For an example with R code, see the second example below.

Second example: application in genomics

Introduction

Cochran-Armitage Trend Tests (CATT) is well used in genomics to test for association between a single marker and a disease (Zheng et al. 2012, sec. 3.3.1). When the true genetic model is respectively the recessive (REC), additive (ADD), or dominant (DOM) model, the trend test ZCATT(x), where x is equal to 0, 1/2, or 1 respectively, gives powerful tests. Yet, the true model is generally unknown and choosing one specific value of x can lead to poor powers for some alternative models.

Then, the MAX3 statistic defined by \[MAX3 = \max\{|ZCATT(0)|, |ZCATT(1/2)|, |ZCATT(1)|\}\] can be used to have a more robust test (the power of the test remains good whatever is the underlying unkonwn model).

Yet, we could make another robust test based on the idea of the first section.

Simulation of values for these three statistics

I followed the algorithm detailed in (Zheng et al. 2012, sec. 3.9.1) to simulate contingency tables under different parameters as, for example, the genotype relative risk (GRR) \(\lambda_2\), the genetic model, the minor allele frequency (MAF) \(p\), etc.

source("https://privefl.github.io/blog/code/simu_counts.R")
source("https://privefl.github.io/blog/code/ZCATT.R")

Let us plot simulated values of three statistics, ZCATT(0), ZCATT(1/2) and ZCATT(1), by pairs. I will add to these plot two decision boundaries corresponding to the rejection of the null hypothesis for the statistics \(MAX2 = \max\{|S_1|, |S_2|\}\) (the square) and \(DENS2 = \hat{f}_{S_1, S_2}\) (the oval).

source("https://privefl.github.io/blog/code/square.R")
pacman::p_load(ks) 

# Set of parameters
LWD <- 3; PCH <- 19; CEX <- 0.5
XLIM <- YLIM <-  c(-3, 3)
models <- c("NULL", "REC", "ADD", "DOM")
n <- length(models)
NSIM <- 200
lambda2 <- c(1.5, 1/1.5)
p <- 0.3

# Get lots of simulated values for the three statistics under H0
counts <- simu_counts(nsim = 1e5, p = p)             
simus <- sapply(c(0, 0.5, 1), function(x) ZCATT(counts, x = x))
simus.save <- replace(simus, is.na(simus), 0)
colnames(simus.save) <- paste0("ZCATT(", c(0, 0.5, 1), ")")
  
# Plot by pairs 
for (ind in -(1:3)) {
  simus2D <- simus.save[, ind]
  # DENS2
  k <- ks::kde(simus2D)
  plot(k, cont = 95, col = n+1, lwd = LWD,
       xlim = XLIM, ylim = YLIM)
  # MAX2
  q <- quantile(apply(abs(simus2D), 1, max), 0.95) 
  square(q, col = 5, lwd = LWD)
  # H0 + Alternatives
  for (lam2 in lambda2) {
    for (i in 1:n) {
      counts <- simu_counts(nsim = NSIM, model = models[i], 
                            lam2 = lam2, p = p)
      simus <- sapply(c(0, 0.5, 1), function(x) ZCATT(counts, x = x))
      simus <- replace(simus, is.na(simus), 0)[, ind]
      points(simus, col = i, cex = CEX, pch = PCH, 
             lwd = ifelse(i == 1, LWD, 1)) 
    }
  }
  # Legend
  legend(x = "bottomright", legend = models, pch = PCH, col = 1:n)
}

Let us plot these three statistics’ values in 3D:

pacman::p_load(rgl, rglwidget) 

rgl::plot3d(x = simus.save[1:NSIM, ], size = 5, xlab = "ZCATT(0)",
            ylab = "ZCATT(1/2)", zlab = "ZCATT(1)")
for (lam2 in lambda2) {
  for (i in 2:length(models)) {
    counts <- simu_counts(nsim = NSIM, p = p, model = models[i], lam2 = lam2)
    simus <- sapply(c(0, 0.5, 1), function(x) ZCATT(counts, x = x))
    rgl::plot3d(x = simus, col = i, add = TRUE)
  }
}
rglwidget()

To leave a comment for the author, please follow the link and comment on their blog: Florian Privé.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.