Visualizing Sampling Distributions

September 25, 2011
By

(This article was first published on bayesianbiologist » Rstats, and kindly contributed to R-bloggers)

Teacher: “How variable is your estimate of the mean?”

Student: “Uhhh, it’s not. I took a sample and calculated the sample mean. I only have one number.”

Teacher: “Yes, but what is the standard deviation of sample means?”

Student: “What do you mean means, I only have the one friggin number.”

Statisticians have a habit of talking about single events as though they’ve happened (or could happen) over and over again.  This is the basis of the Frequentist paradigm, and I’ve found that it really irks early students of statistics. Questions of the type: “How variable is that estimate?” asked by a statistician translates to “How variable would our collection of estimates be if we were to draw samples of the same size from the population, over and over again?”

As a way to help students get into this way of thinking, I have found simulations to be quite useful.  Here is an R script to demonstrate the sampling distribution of means and how we can reproduce the theoretical standard error of the mean.


## This script plots a histogram of sample means from a known population and compares this
## distribution against the theoretical Standard Error of the Means distribution.

## You can play around with sample size (n) to see how the standard error distribution changes.

rm(list=ls())

var_ <- new.env()
n<-20            ## Sample n individuals at a time
p_mean<-0        ## Population mean
p_sd<-1            ## Population standard deviation
N<-500            ## Number of times the experiment (sampling) is replicated

pdf('SE.pdf')

for(i in 1:N)                                ## do the experiment N times
{
smp<-rnorm(n,p_mean,p_sd)                 ## sample n data points from the population

var_$x_bar<-c(var_$x_bar,mean(smp))         ## keep track of the mean (x_bar) from each sample

hist(var_$x_bar,probability=TRUE,col="red",xlim=c(-4,4),xlab="x / x_bar",main="",ylim=c(0,2.2))  # Plot a histogram of x_bar values
points(mean(smp),0,pch=19,cex=1.5,col='black')
curve(dnorm(x,p_mean,p_sd/sqrt(n)),lwd=3,add=TRUE)

text(2.5,1.75,labels=paste('sd/sqrt(n) = ',round(p_sd/sqrt(n),2),sep=''))
text(2.5,1.5,labels=paste('standard deviation of\nsample means = ',round(sd(var_$x_bar),2),sep='') )

curve(dnorm(x,p_mean,p_sd),main="",ylab="",xlim=c(-4,4),xlab="X",col="blue",lwd=3,add=TRUE) ## Plot the sample

text(2.5,0.5,labels=paste('# of means drawn = ',i,sep=''))
text(2.5,0.35,labels=paste('Sample size (n) = ',n,sep=''))
points(smp,rep(0,n),pch=19,cex=1.5,col='purple')
abline(v= mean(smp),col='purple',lwd=4)

legend("topleft",legend=c('Sample points','Population Distribution','Sample mean','Theoretical SE','Empirical SE'),
lty=c(0,1,1,1,1,1,1),lwd=c(0,3,3,3,3,3,3),pch=c(16,NA,NA,NA,NA,NA,NA),col=c('purple','blue','purple','black','red'))

print(paste(i," of ",N))
}
dev.off()

############################################################################################
############################################################################################

The output of the script is a multi-page pdf which can be flipped through to show the building of a histogram of sample means converging on the theoretical sampling distribution.


To leave a comment for the author, please follow the link and comment on their blog: bayesianbiologist » Rstats.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Tags: , , ,

Comments are closed.

Sponsors

Mango solutions



plotly webpage

dominolab webpage



Zero Inflated Models and Generalized Linear Mixed Models with R

Quantide: statistical consulting and training

datasociety

http://www.eoda.de





ODSC

ODSC

CRC R books series





Six Sigma Online Training









Contact us if you wish to help support R-bloggers, and place your banner here.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)