**Nor Talk Too Wise » R**, and kindly contributed to R-bloggers)

Yet again, I have conjured up an (academically) unusual dataset on democracy! This time it’s the Economist Intelligence Unit’s Democracy Index, a weird little gem. The dataset is the basis for a paper the Economist publishes every two years. Because of this biannuality, there is data estimating the “Democratic-ness” of the world’s countries for 2006, 2008 and 2010. What happened between those years, God only knows. I dumped the data into a CSV file and merged them with the polity data. Since they’re both measures of democracy, I hypothesize (again) that they should be fairly linearly correlated. Shall we take a peek?

EIUmergePolity <- read.csv("http://nortalktoowise.com/wp-content/uploads/2011/07/EIUmergePolity.csv") attach(EIUmergePolity) model1 <- lm(polity2 ~ Overall) summary(model1)

Call: lm(formula = polity2 ~ Overall) Residuals: Min 1Q Median 3Q Max -9.9426 -2.4366 0.1764 2.1389 10.5728 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -9.39327 0.42154 -22.28 <2e-16 *** Overall 2.41504 0.07162 33.72 <2e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 3.475 on 470 degrees of freedom (35 observations deleted due to missingness) Multiple R-squared: 0.7076, Adjusted R-squared: 0.7069 F-statistic: 1137 on 1 and 470 DF, p-value: < 2.2e-16

Yet again, we find a good linear model which just isn’t all that impressive. An R-squared of .7 is not half bad for an explanatory model, but these are two different measures of the *same thing*. They should be more closely correlated than that. Let’s take a look, shall we?

plot(Overall, polity2, main="Democracy Index over Polity2 Score") abline(model1)

Damn! It looks kind of non-linear. Again. Taking the same technique I used to fit a quadratic curve to the data last time, maybe we’ll get closer.

model2 <- lm(polity2 ~ Overall + I(Overall^2)) summary(model2)

Call: lm(formula = polity2 ~ Overall + I(Overall^2)) Residuals: Min 1Q Median 3Q Max -10.9995 -1.3065 0.3161 1.5667 11.9635 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -14.51086 0.88953 -16.313 < 2e-16 *** Overall 4.70622 0.36131 13.026 < 2e-16 *** I(Overall^2) -0.21243 0.03289 -6.459 2.64e-10 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 3.334 on 469 degrees of freedom (35 observations deleted due to missingness) Multiple R-squared: 0.7314, Adjusted R-squared: 0.7303 F-statistic: 638.7 on 2 and 469 DF, p-value: < 2.2e-16

Doesn’t help much, I’m sad to say. We see a small increase in the model’s descriptive value, but it’s *really* small. If we draw the curve to the scatterplot, will it at least look better?

plot(Overall, polity2, main="Democracy Index over Polity2 Score") x <- seq(0,10) y <- model2$coef %*% rbind(1,x,x^2) lines(x,y,lwd=2)

Yeah, it looks a little better, but there’s not much that can be done. It seems that there’s a fundamental inconsistency in the way we measure governance, particularly at the extreme ends of the spectrum. Perhaps this is characteristic of data measuring the same thing in different ways. Anyone know any literature examining this sort of thing?

**leave a comment**for the author, please follow the link and comment on their blog:

**Nor Talk Too Wise » R**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...