Here you will find daily news and tutorials about R, contributed by over 750 bloggers.
There are many ways to follow us - By e-mail:On Facebook: If you are an R blogger yourself you are invited to add your own R content feed to this site (Non-English R bloggers should add themselves- here)

The cutoff date for almost all nonschool baseball leagues in the United States is July 31, with the result that more major league players are born in August than in any other month.Malcolm Gladwell, Outliers

A quick analysis to confirm Gladwell’s assertion above. Used data scraped from www.baseball-reference.com. Here’s the evidence:

Distribution of birth months for Major League Baseball players.

We can make a quick check to see whether the non-uniformity is statistically significant.

> chisq.test(table(baseball$month))
Chi-squared test for given probabilities
data: table(baseball$month)
X-squared = 135, df = 11, p-value <2e-16

Yup, it appears to be highly significant.

Obviously the length of the months should make a small difference on the number of births. For example, all else being equal we would expect there to be more births in August (with 31 days) than in July (with only 30 days). We can be a bit more rigorous and take month lengths into account too.

> chisq.test(table(baseball$month), p = month$length / sum(month$length))
Chi-squared test for given probabilities
data: table(baseball$month)
X-squared = 115, df = 11, p-value <2e-16

Looks like the outcome is the same: there is a significant non-uniformity in the birth months of Major League Baseball players.

Related

To leave a comment for the author, please follow the link and comment on their blog: R – Exegetic Analytics.