# Who will be the next President of the US ?

May 5, 2011
By

[This article was first published on Freakonometrics - Tag - R-english, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

A lot of weird facts (?) can be found on the internet. For instance, about the height of the winner of Presidential elections in the US: the taller always win… “Still, being short does, on average, hurt a person’s prospects…The
tall guy gets the girl. The taller presidential candidate almost always
wins.
” (here) “from 1900 to 1968 the man elected U.S. president was always the taller
of the two candidates.
” (there) or “I remember the subversive effect the observation had on me that in
every U.S. presidential race, the taller of the two candidates had been
elected
” (here). Well, if this was true, it would be simple to build a prediction model.But perhaps we should get back on real data. It is possible to build a dataset simply copying a table with heights of US presidents (here), actually with the winner, and the loser(s).

`President=read.table("http://freakonometrics.blog.free.fr/public/data/us-president-height.csv",skip=3)Y=as.numeric(substr(as.character(President\$V4),1,3))X=as.numeric(substr(as.character(President\$V8),1,3))plot(X,Y,xlab="loser",ylab="winner")polygon(c(0,250,0),c(0,250,250),col="light green")polygon(c(0,250,250),c(0,250,0),col="light blue")points(X,Y,pch=19,col="red")`

First, we plot the height of the winner versus the height of the loser,

where in the green area, the taller wins, and in the blue are, the shorter wins.
So, obviously, it is not that simple….
But we can go one step further: the size of the candidate might have an
influence if electors actually the candidates, so perhaps the height
has only a recent influence.
Here is the graph of the evolution of the height of the candidates, with a linear trend, a green one for the winner, and a blue one for the loser.

`Z=as.numeric(as.character(President\$V1))plot(c(Z,Z),c(X,Y))abline(lm(Y~Z),col="light green",lwd=2)abline(lm(X~Z),col="light blue",lwd=2)`

Somehow, the winner is getting taller much faster than the loser (there is an overall increase of the population height over two centuries). Maybe it is time to run some tests, to see if the height can truly be used to predict the winner of US elections,

`> Z1=(Y>=X)> Z2=(Y>X)> prop.test(sum(Z1,na.rm=TRUE),sum(is.na(Z1)==FALSE),p=1/2,alternative="less") 1-sample proportions test with continuity correction data:  sum(Z1, na.rm = TRUE) out of sum(is.na(Z1) == FALSE), null probability 1/2X-squared = 2.2222, df = 1, p-value = 0.932alternative hypothesis: true p is less than 0.595 percent confidence interval:0.0000000 0.7407815sample estimates:p0.6222222 > prop.test(sum(Z2,na.rm=TRUE),sum(is.na(Z2)==FALSE),p=1/2,alternative="less") 1-sample proportions test with continuity correction data:  sum(Z2, na.rm = TRUE) out of sum(is.na(Z2) == FALSE), null probability 1/2X-squared = 0.0889, df = 1, p-value = 0.6172alternative hypothesis: true p is less than 0.595 percent confidence interval:0.0000000 0.6605522sample estimates:p0.5333333`

In 53% of the elections (only), the winner is strictly taller (and in 62% of the elections, he is taller). Here, we (statistically) accept the assumption that the taller wins. But it is even stronger if we focus only on the past 110 years (following World War I),

`> I=Z>1918> Z1=(Y>=X)[I]> Z2=(Y>X)[I]> prop.test(sum(Z1,na.rm=TRUE),sum(is.na(Z1)==FALSE),p=1/2,alternative="less") 1-sample proportions test with continuity correction data:  sum(Z1, na.rm = TRUE) out of sum(is.na(Z1) == FALSE), null probability 1/2X-squared = 6.2609, df = 1, p-value = 0.9938alternative hypothesis: true p is less than 0.595 percent confidence interval:0.0000000 0.9049412sample estimates:p0.7826087 > prop.test(sum(Z2,na.rm=TRUE),sum(is.na(Z2)==FALSE),p=1/2,alternative="less") 1-sample proportions test with continuity correction data:  sum(Z2, na.rm = TRUE) out of sum(is.na(Z2) == FALSE), null probability 1/2X-squared = 2.7826, df = 1, p-value = 0.9524alternative hypothesis: true p is less than 0.595 percent confidence interval:0.0000000 0.8423696sample estimates:p0.6956522`

In almost 80% of the elections following WWI, the taller candidate won the election. I guess I have here a nice and simple model to predict who will win the elections next year…

To leave a comment for the author, please follow the link and comment on their blog: Freakonometrics - Tag - R-english.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Tags: , , , , ,