# Sums of Random Variables

[This article was first published on

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

**Econometrics Beat: Dave Giles' Blog**, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I’m currently teaching first-level course in statistical inference for (mostly) economics students. They’ve taken a one-semester course in descriptive (economic) statistics, and now we’re dealing with sampling distributions, estimation, hypothesis testing, and simple regression analysis.

When dealing with the sampling distribution of the sample mean, based on simple random sampling, we derived the result that this distribution has a mean of μ and a variance of σ

^{2}/n. Here, μ and σ^{2}are the population mean and variance, and n is the sample size. I then told the class that if the population happens to be Normal, then the sampling distribution of the sample average will also be Normal – because linear combinations of Normal random variables are also Normally distributed.In fact, this result holds even if the random variables are jointly Normal, and not independent.

We then got to a discussion of the sampling distribution of the sample variance, s

[The sample average and s

At this point, as a by-product of the material we’d covered, the students knew that:

^{2}. We proved that E[s^{2}] = σ^{2}, regardless of the population distribution. However, to proceed further I considered only the case of a Normal population, and introduced the students to the Chi-Square distribution. We established that [(n-1)s^{2}/σ^{2}] follows a Chi-Square distribution with (n-1) degrees of freedom.[The sample average and s

^{2}are also statistically independent if the population is Normal. For some reason, students at this level generally aren’t told that this result__requires__Normality.]At this point, as a by-product of the material we’d covered, the students knew that:

- Linear combinations of Normal random variables are also Normally distributed.
- Sums of (independent) Chi-Square random variables are also Chi-Square distributed.

It would be understandable if a student then presumed that any linear combination of independent Chi-Square variates is Chi-Square distributed. However, this is

__not__the case. Even the difference of two such variables doesn’t follow a Chi-Square distribution.It would also be understandable for a student to presume that, perhaps, sums of independent random variables from the same distribution, also follow that distribution. Not so!

Students at this level have generally met very few statistical distributions. Usually, the first one that they encounter is the Binomial distribution. Sums of independent Binomial random variables (with the same “success” probability, p)

**in fact also Binomially distributed. Specifically, if X**__are___{1}~ Bi[m , p] and X_{2}~ Bi[n , p], then (X_{1}+ X_{2}) ~ Bi[(m+n) , p]. This is a trivial result, given the independence of X_{1}and X_{2}, and the definition of a Binomial random variable in terms of Bernoulli trials.But what about something as simple as adding together two

*independent*random variables, U_{1}and U_{2}, that each follow a Uniform distribution on [0 , 1]? Does this result in a new random variable that is Uniformly distributed? No, it doesn’t!First of all, it’s easy to see that the “support” of the distribution of this sum (the range of values it can take) is [0 , 2], not [0 , 1].

To do this, I’ve generated a million independent values of U

If you graph Z, using the options: Distribution; Histogram; Density; Bin-width user-specified as 0.02; you get:

Using the following commands in R:

you get:

O.K., it seems that the density function is triangular in shape. [

Now, if you want to establish this result mathematically, rather than by simulation, there are several ways to do it. One is by taking the so-called “convolution of the densities of U

An alternative way of getting the density function for Z is to take the mapping from the joint density of U

This triangular distribution that emerges when you add two independent U[0 ,1] variates together is actually just a special case of the so-called Irwin-Hall distribution. The latter arises when you take the sum of, say, k independent U[0 ,1] random variables.

Here’s what the density for this sum looks like, for various choices of k:

You can see that you don’t have to have a very large value for k before the density looks rather like that of a Normal random variable, with a mean of (k/2). In fact, this gives a “quick-and dirty” way of generating a normally distributed random value. We can see this if we take k = 12, and subtract 6 from the sum:

(We don’t need to do any scaling to get the variance equal to one in value – remember that the variance of a U[0 , 1] variable is 1/12, and we’re summing 12 independent such variables.)

Of course, there are much better ways than this to generate Normal variates, but I won’t go into that here.

There’s an interesting, more general, question that we could also ask. What happens if we take the sum of independent random variables which are Uniformly distributed, but over

In this case, things get much more complicated. There have been some interesting contributions to this problem by Mitra (1971), Sadooghi-Alvandi (2009), and others.

To do this, I’ve generated a million independent values of U

_{1}and U_{2}, added them together, and then plotted the result. You can do this using EViews, with the commands:**SMPL 1 1000000****SERIES [email protected](0,1)****SERIES [email protected](0,1)****SERIES Z=U1+U2****SHOW Z****HIST Z**If you graph Z, using the options: Distribution; Histogram; Density; Bin-width user-specified as 0.02; you get:

Using the following commands in R:

**u1=runif(1000000)**

**u2=runif(1000000)**

**z=u1+u2**

**hist(z, freq=FALSE, breaks=100, col=”lightblue”, main=”Density of Z”)**

you get:

O.K., it seems that the density function is triangular in shape. [

__Cross-check__: the area of the triangle is “1”, as it should be for a density. That’s a good start!]Now, if you want to establish this result mathematically, rather than by simulation, there are several ways to do it. One is by taking the so-called “convolution of the densities of U

_{1}and U_{2}. For full details, see p.292 of**the material**supplied by the**“Chance”**team at Dartmouth College.An alternative way of getting the density function for Z is to take the mapping from the joint density of U

_{1}and U_{2}to the joint density of Z and W = (U_{1}– U_{2}). The Jacobian for this transformation is 1/2. Once you have the joint density of Z and W, you can then integrate out with respect to W, to get the triangular density for Z.This triangular distribution that emerges when you add two independent U[0 ,1] variates together is actually just a special case of the so-called Irwin-Hall distribution. The latter arises when you take the sum of, say, k independent U[0 ,1] random variables.

Here’s what the density for this sum looks like, for various choices of k:

You can see that you don’t have to have a very large value for k before the density looks rather like that of a Normal random variable, with a mean of (k/2). In fact, this gives a “quick-and dirty” way of generating a normally distributed random value. We can see this if we take k = 12, and subtract 6 from the sum:

(We don’t need to do any scaling to get the variance equal to one in value – remember that the variance of a U[0 , 1] variable is 1/12, and we’re summing 12 independent such variables.)

Of course, there are much better ways than this to generate Normal variates, but I won’t go into that here.

There’s an interesting, more general, question that we could also ask. What happens if we take the sum of independent random variables which are Uniformly distributed, but over

*ranges?*__different__In this case, things get much more complicated. There have been some interesting contributions to this problem by Mitra (1971), Sadooghi-Alvandi (2009), and others.

**References****Hall, P.**, 1927. The Distribution of Means for Samples of Size N Drawn from a Population in which the Variate Takes Values Between 0 and 1, All Such Values Being Equally Probable.*Biometrika*, 19, 240–245.**Irwin, J.O.**, 1927. On the Frequency Distribution of the Means of Samples from a Population Having any Law of Frequency with Finite Moments, with Special Reference to Pearson’s Type II.*Biometrika*, 19, 225–239.**Mitra, S. K**., 1971. On the Probability Distribution of the Sum of Uniformly Distributed Random Variables.*SIAM Journal of Applied Mathematics*, 20, 195-198.**Sadooghi-Alvandi, S., A. Nematollahi, & R. Habibi**, 2009. On the Distribution of the Sum of Independent Uniform Random Variables.*Statistical Papers*, 50, 171-175.

© 2013, David E. Giles

To

**leave a comment**for the author, please follow the link and comment on their blog:**Econometrics Beat: Dave Giles' Blog**.R-bloggers.com offers

**daily e-mail updates**about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.