[This article was first published on Nor Talk Too Wise » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Yet again, I have conjured up an (academically) unusual dataset on democracy! This time it’s the Economist Intelligence Unit’s Democracy Index, a weird little gem.  The dataset is the basis for a paper the Economist publishes every two years.  Because of this biannuality, there is data estimating the “Democratic-ness” of the world’s countries for 2006, 2008 and 2010.  What happened between those years, God only knows.  I dumped the data into a CSV file and merged them with the polity data.  Since they’re both measures of democracy, I hypothesize (again) that they should be fairly linearly correlated.  Shall we take a peek?

EIUmergePolity <- read.csv("http://nortalktoowise.com/wp-content/uploads/2011/07/EIUmergePolity.csv")
attach(EIUmergePolity)
model1 <- lm(polity2 ~ Overall)
summary(model1)
Call:
lm(formula = polity2 ~ Overall)

Residuals:
Min      1Q  Median      3Q     Max
-9.9426 -2.4366  0.1764  2.1389 10.5728

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -9.39327    0.42154  -22.28   <2e-16 ***
Overall      2.41504    0.07162   33.72   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 3.475 on 470 degrees of freedom
(35 observations deleted due to missingness)
Multiple R-squared: 0.7076,	Adjusted R-squared: 0.7069
F-statistic:  1137 on 1 and 470 DF,  p-value: < 2.2e-16

Yet again, we find a good linear model which just isn’t all that impressive.  An R-squared of .7 is not half bad for an explanatory model, but these are two different measures of the same thing.  They should be more closely correlated than that.  Let’s take a look, shall we?

plot(Overall, polity2, main="Democracy Index over Polity2 Score")
abline(model1)

Damn! It looks kind of non-linear. Again.  Taking the same technique I used to fit a quadratic curve to the data last time, maybe we’ll get closer.

model2 <- lm(polity2 ~ Overall + I(Overall^2))
summary(model2)
Call:
lm(formula = polity2 ~ Overall + I(Overall^2))

Residuals:
Min       1Q   Median       3Q      Max
-10.9995  -1.3065   0.3161   1.5667  11.9635

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)  -14.51086    0.88953 -16.313  < 2e-16 ***
Overall        4.70622    0.36131  13.026  < 2e-16 ***
I(Overall^2)  -0.21243    0.03289  -6.459 2.64e-10 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 3.334 on 469 degrees of freedom
(35 observations deleted due to missingness)
Multiple R-squared: 0.7314,	Adjusted R-squared: 0.7303
F-statistic: 638.7 on 2 and 469 DF,  p-value: < 2.2e-16

Doesn’t help much, I’m sad to say.  We see a small increase in the model’s descriptive value, but it’s really small.  If we draw the curve to the scatterplot, will it at least look better?

plot(Overall, polity2, main="Democracy Index over Polity2 Score")
x <- seq(0,10)
y <- model2\$coef %*% rbind(1,x,x^2)
lines(x,y,lwd=2)

Yeah, it looks a little better, but there’s not much that can be done.  It seems that there’s a fundamental inconsistency in the way we measure governance, particularly at the extreme ends of the spectrum.  Perhaps this is characteristic of data measuring the same thing in different ways.  Anyone know any literature examining this sort of thing?