# Who will be the next President of the US ?

**Freakonometrics - Tag - R-english**, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

A lot of weird facts (?) can be found on the internet. For instance, about the height of the winner of Presidential elections in the US: the taller always win… “Still, being short does, on average, hurt a person’s prospects…The

tall guy gets the girl. The taller presidential candidate almost always

wins.” (here) “from 1900 to 1968 the man elected U.S. president was always the taller

of the two candidates.” (there) or “I remember the subversive effect the observation had on me that in

every U.S. presidential race, the taller of the two candidates had been

elected” (here). Well, if this was true, it would be simple to build a prediction model.But perhaps we should get back on real data. It is possible to build a dataset simply copying a table with heights of US presidents (here), actually with the winner, and the loser(s).

President=read.table("http://freakonometrics.blog.free.fr/public/data/us-president-height.csv",skip=3)

Y=as.numeric(substr(as.character(President$V4),1,3))

X=as.numeric(substr(as.character(President$V8),1,3))

plot(X,Y,xlab="loser",ylab="winner")

polygon(c(0,250,0),c(0,250,250),col="light green")

polygon(c(0,250,250),c(0,250,0),col="light blue")

points(X,Y,pch=19,col="red")

First, we plot the height of the winner versus the height of the loser,

where in the green area, the taller wins, and in the blue are, the shorter wins.

So, obviously, it is not *that *simple….

But we can go one step further: the size of the candidate might have an

influence if electors actually the candidates, so perhaps the height

has only a *recent* influence.

Here is the graph of the evolution of the height of the candidates, with a linear trend, a green one for the winner, and a blue one for the loser.

Z=as.numeric(as.character(President$V1))

plot(c(Z,Z),c(X,Y))

abline(lm(Y~Z),col="light green",lwd=2)

abline(lm(X~Z),col="light blue",lwd=2)

Somehow, the winner is getting taller much faster than the loser (there is an overall increase of the population height over two centuries). Maybe it is time to run some tests, to see if the height can truly be used to predict the winner of US elections,

> Z1=(Y>=X)

> Z2=(Y>X)

> prop.test(sum(Z1,na.rm=TRUE),sum(is.na(Z1)==FALSE),p=1/2,alternative="less")

1-sample proportions test with continuity correction

data: sum(Z1, na.rm = TRUE) out of sum(is.na(Z1) == FALSE), null probability 1/2

X-squared = 2.2222, df = 1, p-value = 0.932

alternative hypothesis: true p is less than 0.5

95 percent confidence interval:

0.0000000 0.7407815

sample estimates:

p

0.6222222

> prop.test(sum(Z2,na.rm=TRUE),sum(is.na(Z2)==FALSE),p=1/2,alternative="less")

1-sample proportions test with continuity correction

data: sum(Z2, na.rm = TRUE) out of sum(is.na(Z2) == FALSE), null probability 1/2

X-squared = 0.0889, df = 1, p-value = 0.6172

alternative hypothesis: true p is less than 0.5

95 percent confidence interval:

0.0000000 0.6605522

sample estimates:

p

0.5333333

In 53% of the elections (only), the winner is strictly taller (and in 62% of the elections, he is taller). Here, we (statistically) accept the assumption that the taller wins. But it is even stronger if we focus only on the past 110 years (following World War I),

> I=Z>1918

> Z1=(Y>=X)[I]

> Z2=(Y>X)[I]

> prop.test(sum(Z1,na.rm=TRUE),sum(is.na(Z1)==FALSE),p=1/2,alternative="less")

1-sample proportions test with continuity correction

data: sum(Z1, na.rm = TRUE) out of sum(is.na(Z1) == FALSE), null probability 1/2

X-squared = 6.2609, df = 1, p-value = 0.9938

alternative hypothesis: true p is less than 0.5

95 percent confidence interval:

0.0000000 0.9049412

sample estimates:

p

0.7826087

> prop.test(sum(Z2,na.rm=TRUE),sum(is.na(Z2)==FALSE),p=1/2,alternative="less")

1-sample proportions test with continuity correction

data: sum(Z2, na.rm = TRUE) out of sum(is.na(Z2) == FALSE), null probability 1/2

X-squared = 2.7826, df = 1, p-value = 0.9524

alternative hypothesis: true p is less than 0.5

95 percent confidence interval:

0.0000000 0.8423696

sample estimates:

p

0.6956522

In almost 80% of the elections following WWI, the taller candidate won the election. I guess I have here a nice and simple model to predict who will win the elections next year…

**leave a comment**for the author, please follow the link and comment on their blog:

**Freakonometrics - Tag - R-english**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.