Who will be the next President of the US ?

[This article was first published on Freakonometrics - Tag - R-english, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

A lot of weird facts (?) can be found on the internet. For instance, about the height of the winner of Presidential elections in the US: the taller always win… “Still, being short does, on average, hurt a person’s prospects…The tall guy gets the girl. The taller presidential candidate almost always wins.” (here) “from 1900 to 1968 the man elected U.S. president was always the taller of the two candidates.” (there) or “I remember the subversive effect the observation had on me that in every U.S. presidential race, the taller of the two candidates had been elected” (here). Well, if this was true, it would be simple to build a prediction model.But perhaps we should get back on real data. It is possible to build a dataset simply copying a table with heights of US presidents (here), actually with the winner, and the loser(s).

President=read.table("http://freakonometrics.blog.free.fr/public/data/us-president-height.csv",skip=3)
Y=as.numeric(substr(as.character(President$V4),1,3))
X=as.numeric(substr(as.character(President$V8),1,3))
plot(X,Y,xlab="loser",ylab="winner")
polygon(c(0,250,0),c(0,250,250),col="light green")
polygon(c(0,250,250),c(0,250,0),col="light blue")
points(X,Y,pch=19,col="red")

First, we plot the height of the winner versus the height of the loser,

where in the green area, the taller wins, and in the blue are, the shorter wins.
So, obviously, it is not that simple…. But we can go one step further: the size of the candidate might have an influence if electors actually the candidates, so perhaps the height has only a recent influence.
Here is the graph of the evolution of the height of the candidates, with a linear trend, a green one for the winner, and a blue one for the loser.

Z=as.numeric(as.character(President$V1))
plot(c(Z,Z),c(X,Y))
abline(lm(Y~Z),col="light green",lwd=2)
abline(lm(X~Z),col="light blue",lwd=2)

Somehow, the winner is getting taller much faster than the loser (there is an overall increase of the population height over two centuries). Maybe it is time to run some tests, to see if the height can truly be used to predict the winner of US elections,

> Z1=(Y>=X)
> Z2=(Y>X)
> prop.test(sum(Z1,na.rm=TRUE),sum(is.na(Z1)==FALSE),p=1/2,alternative="less")
 
1-sample proportions test with continuity correction
 
data:  sum(Z1, na.rm = TRUE) out of sum(is.na(Z1) == FALSE), null probability 1/2
X-squared = 2.2222, df = 1, p-value = 0.932
alternative hypothesis: true p is less than 0.5
95 percent confidence interval:
0.0000000 0.7407815
sample estimates:
p
0.6222222
 
> prop.test(sum(Z2,na.rm=TRUE),sum(is.na(Z2)==FALSE),p=1/2,alternative="less")
 
1-sample proportions test with continuity correction
 
data:  sum(Z2, na.rm = TRUE) out of sum(is.na(Z2) == FALSE), null probability 1/2
X-squared = 0.0889, df = 1, p-value = 0.6172
alternative hypothesis: true p is less than 0.5
95 percent confidence interval:
0.0000000 0.6605522
sample estimates:
p
0.5333333

In 53% of the elections (only), the winner is strictly taller (and in 62% of the elections, he is taller). Here, we (statistically) accept the assumption that the taller wins. But it is even stronger if we focus only on the past 110 years (following World War I),

> I=Z>1918
> Z1=(Y>=X)[I]
> Z2=(Y>X)[I]
> prop.test(sum(Z1,na.rm=TRUE),sum(is.na(Z1)==FALSE),p=1/2,alternative="less")
 
1-sample proportions test with continuity correction
 
data:  sum(Z1, na.rm = TRUE) out of sum(is.na(Z1) == FALSE), null probability 1/2
X-squared = 6.2609, df = 1, p-value = 0.9938
alternative hypothesis: true p is less than 0.5
95 percent confidence interval:
0.0000000 0.9049412
sample estimates:
p
0.7826087
 
> prop.test(sum(Z2,na.rm=TRUE),sum(is.na(Z2)==FALSE),p=1/2,alternative="less")
 
1-sample proportions test with continuity correction
 
data:  sum(Z2, na.rm = TRUE) out of sum(is.na(Z2) == FALSE), null probability 1/2
X-squared = 2.7826, df = 1, p-value = 0.9524
alternative hypothesis: true p is less than 0.5
95 percent confidence interval:
0.0000000 0.8423696
sample estimates:
p
0.6956522

In almost 80% of the elections following WWI, the taller candidate won the election. I guess I have here a nice and simple model to predict who will win the elections next year…

To leave a comment for the author, please follow the link and comment on their blog: Freakonometrics - Tag - R-english.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)