Russian elections

March 6, 2012
By

(This article was first published on Wiekvoet, and kindly contributed to R-bloggers)

Just a few words about the Russian election. I read this entry http://www.badscience.net/2012/03/is-there-statistical-evidence-of-fraud-in-the-russian-election-data/ and thought to look for myself. For me it seems the data is not good enough to answer the fraud question.


Downloading data, reading and just look:
> r1 <- read.xls("xxxxxxxxxxxxxx")

> head(r1)
            projecturl id     updt region uik obstrusted INVALID VALID
1 http://sms.golos.org  1 38324.72     27 650          1       4   323
2 http://sms.golos.org  2 38689.09     25 216          0       9   927
3 http://sms.golos.org  3 38324.72     38 732          1       7  1282
4 http://sms.golos.org  4 38324.72     25 291          0      14  1185
5 http://sms.golos.org  5 38324.72     38 668          0      15  1510
6 http://sms.golos.org  6 38324.72     27 198          0      15  1889
  Zhirinovsky Zyuganov Mironov Prokhorov Putin
1          42       40       3        24   214
2          88      229      58        92   460
3          80      333      46       150   673
4         129      315      67       175   499
5          76      395      70       227   742
6         127      353     115       379   915


Data looks good. Some unknown columns, region, VALID and the contenders look pretty straightforward.


Some regions occur once, others quite often. Some are completely missing

> regs <- xtabs(~ region,data=r1)
> names(regs[regs==1])
[1] "13" "32" "43" "65" "75" "86" "87"

Quite some difference in counts per region, as per the next plot. That is actually very odd, for someone not knowing about this field..
plot(xtabs(VALID ~ factor(region,levels=min(region):max(region)),data=r1))

And, if we think VALID=Zhirinovsky + Zyuganov + Mironov + Prokhorov + Putin, that is not true either.

r1$myValid <- with(r1, Zhirinovsky + Zyuganov + Mironov + Prokhorov + Putin)
plot(myValid ~ VALID,data=r1)
The data just do not add together.

Conclusion
The data is either not complete and contains too many questions to even think about looking for fraud, or this is the true data and it is so bad as seen here and the fraud is obvious.


To leave a comment for the author, please follow the link and comment on his blog: Wiekvoet.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.