Will Mu Go Out With Median

May 28, 2013
By

(This article was first published on Statistical Research » R, and kindly contributed to R-bloggers)

True story (no really, this did actually happen).  While in grad school one of the other teaching assistants was approached by one of the students and was asked “will mu go out with median?”  The teaching assistant thought the play on words was pretty funny, laughed, and then cluelessly walked away.  All of us other grad students were surprised because we knew that really was mean.

There are a lot of ways to calculate a measure of center.  Here are several examples that include arithmetic mean, geometric mean, harmonic mean, and for good measure the median.

Histogram of Pythagorean Means

Arithmetic Mean

By far the most common is the mean (aka the average).  This is simply taking a list of number and dividing by the count of those numbers.  It is useful when there are many number that add up to a total. What does this tell us?  If you were looking at a teeter totter with a bunch of kids on it then it’s where the bar balances.  It doesn’t really matter how many kids you have on either side it’s simply where the weight of the kids is even on each side.

Geometric Mean

Lesser used is the geometric mean.  This is used when there are many quantities that multiply together to produce a product of those numbers.  This is a more appropriate mean when dealing with proportional growth. Take for example when you invest in something like a 401k.  If you get a 8% growth for the first year, 12% for the second, and 11% for the third you would want to take the geometric mean.  This can be re-written as 1.08 the first year, 1.12 for the second, and 1.11 for the third.  The geometric mean is then calculated as \prod_{n=1}^3\left(1.08 \cdot 1.12 \cdot 1.11\right)^{\frac{1}{3}} - 1 = 10.32\% .

This table shows how the results from the geometric mean match the results when applying the rate year by year.

Yearly Geo-Mean
Rate 1000 1000
0.08 1.08 1080 1103.201691
0.12 1.12 1209.6 1217.053972
0.11 1.11  1342.66  1342.66
0.103201691

 

Harmonic Mean

Harmonic mean, like the arithmetic mean, is additive in nature.  However, the larger quantities get dampened down.  Consequently, it can be used in some situations when there are outliers.  This mean can also be useful in a variety of areas including machine learning when averaging precision and recall of classifiers.

Median

Medians are another example of measure of center.  However, unlike arithmetic mean this is less sensitive to outliers.  For example when determining a measure of center for national income the mean income would result in a different number than the median income and would lean more toward the very wealthy.  However, the median is a better measure of center as it identifies the middle point where half the observations are on either side.

The following code snippets show the three Pythagorean means (arithmetic, geometric, harmonic) as well as the median.

### Generate some fake data
x = cbind(sort(rnorm(25,10,1)),rpois(25,10))
### Write a function for a weighted median
X = x[,1]; w = x[,2]
weighted.median = function(X,w=1){
### If a single value of 1 was entered then set up array
if(length(w)==1){
w = rep(1,length(X))
}

X = cbind(X,w)
X = X[complete.cases(X),]
y = X[order(X[,1]),] # Sort the matrix
y = cbind(y,cumsum(y[,2])) # Attach the cumulative sum

### locate the positions the need to be averaged.
### If there is an exact middle point then it uses the middle point.
which.min.lim = min( which(y[,3]/sum(y[,2]) >= 0.5 ) )
which.max.lim = max( which(y[,3]/sum(y[,2]) <= 0.5 ) )

weighted.median = mean(y[max(which.min.lim, which.max.lim),1])

return(weighted.median)
}
harmonic.mean = function(x,w=1){
if(length(w)==1){
w = rep(1,length(x))
}
dem = w/x # Set up denominator values
harmonic.mean = sum(w)/sum(dem) # Calculate harmonic mean
return(harmonic.mean)
}

geometric.mean = function(x,w=1){
if(length(w)==1){
w = rep(1,length(x))
}

a = x^w
b = 1/sum(w)
geometric.mean = prod(a) ^ b

### Same calculation just a different way
# exp( sum(w * log(x) ) / sum(w) )

return(geometric.mean)
}

mean(x[,1])

weighted.mean(x[,1],x[,2])

median(x[,1])
weighted.median(x[,1],x[,2])
harmonic.mean(x[,1], x[,2])
harmonic.mean(x[,1])

geometric.mean(x[,1],x[,2])
geometric.mean(x[,1])

hist(x, nclass=100, xlim=c(10,11));

abline(v=weighted.mean(x[,1],x[,2]), col='red', lwd=2)
abline(v=weighted.median(x[,1],x[,2]), col='blue', lwd=2)
abline(v=harmonic.mean(x[,1], x[,2]), col='green', lwd=2)
abline(v=geometric.mean(x[,1],x[,2]), col='purple', lwd=2)

To leave a comment for the author, please follow the link and comment on his blog: Statistical Research » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.