# Sums of Random Variables

**Econometrics Beat: Dave Giles' Blog**, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

^{2}/n. Here, μ and σ

^{2}are the population mean and variance, and n is the sample size. I then told the class that if the population happens to be Normal, then the sampling distribution of the sample average will also be Normal – because linear combinations of Normal random variables are also Normally distributed.

^{2}. We proved that E[s

^{2}] = σ

^{2}, regardless of the population distribution. However, to proceed further I considered only the case of a Normal population, and introduced the students to the Chi-Square distribution. We established that [(n-1)s

^{2}/σ

^{2}] follows a Chi-Square distribution with (n-1) degrees of freedom.

[The sample average and s^{2} are also statistically independent if the population is Normal. For some reason, students at this level generally aren’t told that this result __requires__ Normality.]

At this point, as a by-product of the material we’d covered, the students knew that:

- Linear combinations of Normal random variables are also Normally distributed.
- Sums of (independent) Chi-Square random variables are also Chi-Square distributed.

__not__the case. Even the difference of two such variables doesn’t follow a Chi-Square distribution.

**in fact also Binomially distributed. Specifically, if X**

__are___{1}~ Bi[m , p] and X

_{2}~ Bi[n , p], then (X

_{1}+ X

_{2}) ~ Bi[(m+n) , p]. This is a trivial result, given the independence of X

_{1}and X

_{2}, and the definition of a Binomial random variable in terms of Bernoulli trials.

*independent*random variables, U

_{1}and U

_{2}, that each follow a Uniform distribution on [0 , 1]? Does this result in a new random variable that is Uniformly distributed? No, it doesn’t!

To do this, I’ve generated a million independent values of U_{1} and U_{2}, added them together, and then plotted the result. You can do this using EViews, with the commands:

**SMPL 1 1000000****SERIES [email protected](0,1)****SERIES [email protected](0,1)****SERIES Z=U1+U2****SHOW Z****HIST Z**

If you graph Z, using the options: Distribution; Histogram; Density; Bin-width user-specified as 0.02; you get:

Using the following commands in R:

**u1=runif(1000000)**

**u2=runif(1000000)**

**z=u1+u2**

**hist(z, freq=FALSE, breaks=100, col=”lightblue”, main=”Density of Z”)**

you get:

O.K., it seems that the density function is triangular in shape. [__Cross-check__: the area of the triangle is “1”, as it should be for a density. That’s a good start!]

Now, if you want to establish this result mathematically, rather than by simulation, there are several ways to do it. One is by taking the so-called “convolution of the densities of U_{1} and U_{2}. For full details, see p.292 of **the material** supplied by the **“Chance”** team at Dartmouth College.

An alternative way of getting the density function for Z is to take the mapping from the joint density of U_{1} and U_{2} to the joint density of Z and W = (U_{1} – U_{2}). The Jacobian for this transformation is 1/2. Once you have the joint density of Z and W, you can then integrate out with respect to W, to get the triangular density for Z.

This triangular distribution that emerges when you add two independent U[0 ,1] variates together is actually just a special case of the so-called Irwin-Hall distribution. The latter arises when you take the sum of, say, k independent U[0 ,1] random variables.

Here’s what the density for this sum looks like, for various choices of k:

You can see that you don’t have to have a very large value for k before the density looks rather like that of a Normal random variable, with a mean of (k/2). In fact, this gives a “quick-and dirty” way of generating a normally distributed random value. We can see this if we take k = 12, and subtract 6 from the sum:

(We don’t need to do any scaling to get the variance equal to one in value – remember that the variance of a U[0 , 1] variable is 1/12, and we’re summing 12 independent such variables.)

Of course, there are much better ways than this to generate Normal variates, but I won’t go into that here.

There’s an interesting, more general, question that we could also ask. What happens if we take the sum of independent random variables which are Uniformly distributed, but over * different* ranges?

In this case, things get much more complicated. There have been some interesting contributions to this problem by Mitra (1971), Sadooghi-Alvandi (2009), and others.

**References**

**Hall, P.**, 1927. The Distribution of Means for Samples of Size N Drawn from a Population in which the Variate Takes Values Between 0 and 1, All Such Values Being Equally Probable. *Biometrika*, 19, 240–245.

**Irwin, J.O.**, 1927. On the Frequency Distribution of the Means of Samples from a Population Having any Law of Frequency with Finite Moments, with Special Reference to Pearson’s Type II. *Biometrika*, 19, 225–239.

**Mitra, S. K**., 1971. On the Probability Distribution of the Sum of Uniformly Distributed Random Variables. *SIAM Journal of Applied Mathematics*, 20, 195-198.

**Sadooghi-Alvandi, S., A. Nematollahi, & R. Habibi**, 2009. On the Distribution of the Sum of Independent Uniform Random Variables. *Statistical Papers*, 50, 171-175.

**leave a comment**for the author, please follow the link and comment on their blog:

**Econometrics Beat: Dave Giles' Blog**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.