Trivial, but useful: sequences with defined mean/s.d.

Posted on July 31, 2013 by anspiess in R bloggers | 0 Comments

[This article was first published on Rmazing, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

O.k., the following post may be (mathematically) trivial, but could be somewhat useful for people that do simulations/testing of statistical methods.
Let’s say we want to test the dependence of p-values derived from a t-test to a) the ratio of means between two groups, b) the standard deviation or c) the sample size(s) of the two groups. For this setup we would need to i.e. generate two groups with defined $\mu, \sigma$ and $n$ .
Often encountered in simulations is that groups are generated with rnorm and then plugged into the simulation. However (and evidently), it is clear that sampling from a normal distribution does not deliver a vector with exactly defined statistical properties (although the “law of large numbers” states that with enough large sample size it converges to that…).
For example,

> x <- rnorm(1000, 5, 2) 
> mean(x) 
[1] 4.998388 
> sd(x) 
[1] 2.032262

shows what I meant above ( $\mu_x \neq 5, \sigma_x \neq 2$ ).

Luckily, we can create vectors with exact mean and s.d. by a “scaled-and-shifted z-transformation” of an input vector $X$ :

$Z = \frac{X - \mu_X}{\sigma_X} \cdot \mathbf{sd} + \mathbf{mean}$

where sd is the desired standard deviation and mean the desired mean of the output vector Z.

The code is simple enough:

statVec <- function(x, mean, sd)
{
  X <- x
  MEAN <- mean
  SD <- sd
  Z <- (((X - mean(X, na.rm = TRUE))/sd(X, na.rm = TRUE))) * SD
  MEAN + Z
}

So, using this on the rnorm-generated vector x from above:

> z <- statVec(x, 5, 2)
> mean(z)
[1] 5
> sd(z)
[1] 2

we have created a vector with exact statistical properties, which is also normally distributed since multiplication and addition of a normal distribution preserves normality.

Cheers, Andrej

Filed under: General Tagged: mean, s.d., sequence, z-transformation

To leave a comment for the author, please follow the link and comment on their blog: Rmazing.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

R-bloggers

R news and tutorials contributed by hundreds of R bloggers

Trivial, but useful: sequences with defined mean/s.d.

Related

Related

Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts. (You will not see this message again.)

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)