# Why trust some supposed laws of statistical sampling and…

August 15, 2012
By

(This article was first published on Isomorphismes, and kindly contributed to R-bloggers)

Why trust some supposed laws of statistical sampling and convergence when you can just test them yourself? If you have a computer with `R` installed (also recommended: `Rstudio`) then you can stop dithering about whether these `n=1000` studies cited in the newspapers actually resemble the truth enough, or not.

```# make some people
# let's say 1e5 one-dimensional people characterised by one parameter
# like "wealth" or "health" or "support of some particular policy"
# if you want you can create subsets like "Irish" and "English"
# ... I'll leave that kind of fun to you
base <- rnorm(1e5, mean=45, sd=4)
inheritance <- exp( exp( exp( rpois(1e5, 1.1) )))
luck <- base * inheritance * rpois(1e5, 2.1)
extreme.luck <- rcauchy(1e5, location=45, scale=4)
people <- exp( base + inheritance + luck + extreme.luck )
# randomly sample the people
Nielsen <- sample( people[1:1e5], 100, replace=F )
# take some statistics of each and compare them
mean(Nielsen)
mean(people)
diff(  mean(Nielsen), mean(people)  )
# and so on
# compare histograms, compare medians, compare stdev's, compare kurtoses...
```

(Notice this is an economy with no geography, no choice, and no response.)

You could also simulate “biased sampling” by grabbing for example `people[1:100]` rather than `sample(people[1:1e5], 100, replace=F)`. Or to be a little biased but also a little random you could make a `indexes.to.sample.from <- floor( runif( 100, min=1, max=316) ^2 )`. (Squaring will disperse the values with a bias towards the earlier. Think about that meaning of the parabola picture!)

Nice way to play around with:

• Different functions for generating (and noising up) a bunch of sims
• Different measures of central tendency or spread (is `median` better than `mean`? You can prove it to yourself.)
• `R`. Not that we need more reasons to play around with R, but we will gladly accept them.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

Tags: , , , , , , ,