**Statistical Research » R**, and kindly contributed to R-bloggers)

Like most of my post these code snippets derive from various other projects. In this example it shows a simulation of how one can determine if a set of t statistics are distributed properly. This can be useful when sampling known populations (e.g. U.S. census or hospital populations) or populations that will soon be known (e.g. pre-election, exit polling). This is a simple example but the concept can be expanded upon to include varying sample sizes and varying known mean values. When collecting data in real life the *nsim* value will likely be only a handful of random samples rather than a million. In this example a fixed constant sample size of 50 is used.

If you’re collecting data and you begin to see that your distribution of t scores begins to deviate from the known distribution then it might be time to tweak some of the algorithms.

set.seed(1234) nsims <- 1000000 n <- 50 x <- replicate(nsims, rexp(n, 5)) x.sd <- apply(x, 2, sd) x.mean <- apply(x, 2, mean) x.t <- (x.mean - 0)/(x.sd/sqrt(nrow(x))) qqnorm(x.t) # follows a normal distribution (x.grand.mean <- mean(x.t)) # ~0 median(x.t) # ~0 var(x.t) # v/(v-2) skewness(x.t) # ~0 library(e1071) kurtosis(x.t, type=1) theta <- seq(-4,4, by=.01) p <- dt(theta, n) p <- p/max(p) d <- density(x.t) plot(d) plot(theta, p, type = "l", ylab = "Density", lty = 2, lwd = 3) abline(v=x.grand.mean, col="red")

**leave a comment**for the author, please follow the link and comment on their blog:

**Statistical Research » R**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...