# Distribution of T-Scores

March 2, 2013
By

(This article was first published on Statistical Research » R, and kindly contributed to R-bloggers)

Like most of my post these code snippets derive from various other projects.  In this example it shows a simulation of how one can determine if a set of t statistics are distributed properly.  This can be useful when sampling known populations (e.g. U.S. census or hospital populations) or populations that will soon be known (e.g. pre-election, exit polling).  This is a simple example but the concept can be expanded upon to include varying sample sizes and varying known mean values.  When collecting data in real life the nsim value will likely be only a handful of random samples rather than a million.  In this example a fixed constant sample size of 50 is used.

If you’re collecting data and you begin to see that your distribution of t scores begins to deviate from the known distribution then it might be time to tweak some of the algorithms.


set.seed(1234)
nsims <- 1000000
n <- 50
x <- replicate(nsims, rexp(n, 5))

x.sd <- apply(x, 2, sd)
x.mean <- apply(x, 2, mean)

x.t <- (x.mean - 0)/(x.sd/sqrt(nrow(x)))

qqnorm(x.t) # follows a normal distribution

(x.grand.mean <- mean(x.t)) # ~0
median(x.t) # ~0
var(x.t) # v/(v-2)
skewness(x.t) # ~0

library(e1071)
kurtosis(x.t, type=1)

theta <- seq(-4,4, by=.01)
p <- dt(theta, n)
p <- p/max(p)
d <- density(x.t)
plot(d)

plot(theta, p, type = "l", ylab = "Density", lty = 2, lwd = 3)
abline(v=x.grand.mean, col="red")