Who will be the next President of the US ?
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
A lot of weird facts (?) can be found on the internet. For instance, about the height of the winner of Presidential elections in the US: the taller always win… “Still, being short does, on average, hurt a person’s prospects…The tall guy gets the girl. The taller presidential candidate almost always wins.” (here) “from 1900 to 1968 the man elected U.S. president was always the taller of the two candidates.” (there) or “I remember the subversive effect the observation had on me that in every U.S. presidential race, the taller of the two candidates had been elected” (here). Well, if this was true, it would be simple to build a prediction model.But perhaps we should get back on real data. It is possible to build a dataset simply copying a table with heights of US presidents (here), actually with the winner, and the loser(s).
President=read.table("http://freakonometrics.blog.free.fr/public/data/us-president-height.csv",skip=3) Y=as.numeric(substr(as.character(President$V4),1,3)) X=as.numeric(substr(as.character(President$V8),1,3)) plot(X,Y,xlab="loser",ylab="winner") polygon(c(0,250,0),c(0,250,250),col="light green") polygon(c(0,250,250),c(0,250,0),col="light blue") points(X,Y,pch=19,col="red")
First, we plot the height of the winner versus the height of the loser,
where in the green area, the taller wins, and in the blue are, the shorter wins.
So, obviously, it is not that simple….
But we can go one step further: the size of the candidate might have an
influence if electors actually the candidates, so perhaps the height
has only a recent influence.
Here is the graph of the evolution of the height of the candidates, with a linear trend, a green one for the winner, and a blue one for the loser.
Z=as.numeric(as.character(President$V1)) plot(c(Z,Z),c(X,Y)) abline(lm(Y~Z),col="light green",lwd=2) abline(lm(X~Z),col="light blue",lwd=2)
Somehow, the winner is getting taller much faster than the loser (there is an overall increase of the population height over two centuries). Maybe it is time to run some tests, to see if the height can truly be used to predict the winner of US elections,
> Z1=(Y>=X) > Z2=(Y>X) > prop.test(sum(Z1,na.rm=TRUE),sum(is.na(Z1)==FALSE),p=1/2,alternative="less") 1-sample proportions test with continuity correction data: sum(Z1, na.rm = TRUE) out of sum(is.na(Z1) == FALSE), null probability 1/2 X-squared = 2.2222, df = 1, p-value = 0.932 alternative hypothesis: true p is less than 0.5 95 percent confidence interval: 0.0000000 0.7407815 sample estimates: p 0.6222222 > prop.test(sum(Z2,na.rm=TRUE),sum(is.na(Z2)==FALSE),p=1/2,alternative="less") 1-sample proportions test with continuity correction data: sum(Z2, na.rm = TRUE) out of sum(is.na(Z2) == FALSE), null probability 1/2 X-squared = 0.0889, df = 1, p-value = 0.6172 alternative hypothesis: true p is less than 0.5 95 percent confidence interval: 0.0000000 0.6605522 sample estimates: p 0.5333333
In 53% of the elections (only), the winner is strictly taller (and in 62% of the elections, he is taller). Here, we (statistically) accept the assumption that the taller wins. But it is even stronger if we focus only on the past 110 years (following World War I),
> I=Z>1918 > Z1=(Y>=X)[I] > Z2=(Y>X)[I] > prop.test(sum(Z1,na.rm=TRUE),sum(is.na(Z1)==FALSE),p=1/2,alternative="less") 1-sample proportions test with continuity correction data: sum(Z1, na.rm = TRUE) out of sum(is.na(Z1) == FALSE), null probability 1/2 X-squared = 6.2609, df = 1, p-value = 0.9938 alternative hypothesis: true p is less than 0.5 95 percent confidence interval: 0.0000000 0.9049412 sample estimates: p 0.7826087 > prop.test(sum(Z2,na.rm=TRUE),sum(is.na(Z2)==FALSE),p=1/2,alternative="less") 1-sample proportions test with continuity correction data: sum(Z2, na.rm = TRUE) out of sum(is.na(Z2) == FALSE), null probability 1/2 X-squared = 2.7826, df = 1, p-value = 0.9524 alternative hypothesis: true p is less than 0.5 95 percent confidence interval: 0.0000000 0.8423696 sample estimates: p 0.6956522
In almost 80% of the elections following WWI, the taller candidate won the election. I guess I have here a nice and simple model to predict who will win the elections next year…
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.