Distribution of Mean of the Combinations of a Set.

January 24, 2017

(This article was first published on Data R Value, and kindly contributed to R-bloggers)

For some purpose I found myself generating and analyzing the average of the combinations of a set and when I generated the corresponding histogram I was surprised by its shape.

It should be remembered that the combinations C(m, n) of a set are the number of subsets of a set of m elements taken from n in n.

The number of combinations is calculated with:

This is the very simple code to generate the combinations, calculate their mean and generate the histogram:

m <- 50
n <- 6

COMBINATIONS <- t(as.data.frame(combn(m,n)))

C_M <- apply(COMBINATIONS, 1, mean)

hist_all <-hist(C_M, breaks = length(unique(C_M)), col = “blue”)

Interesting histogram. It’s as if there are two distributions.
But if we change the value of m by:

m <- 50
n <- 4

We obtain the following histogram:

Although it is a very simple math and programming exercise, the interesting thing is to interpret why histograms behave this way, so it becomes an exercise in understanding the visualization.


To leave a comment for the author, please follow the link and comment on their blog: Data R Value.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.


Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)