An R Package for the “controversial” counts of registered voters in Uganda

February 12, 2016
By

(This article was first published on R – Data Science Africa, and kindly contributed to R-bloggers)

Some members of the Ugandan media have claimed that counts of registered voters for the 2016 General Election, contain almost 20,000 “ghost voters”.  This is according to their analyses (using excel) of data released by the Uganda Electoral Commission (EC), the body charged with conducting a free and fair election. With voting just 8 days away, I wanted to compile this data into an R package and make available it available to the general public (R users/data analysts) to analyze and develop their own conclusions. Its open data now!!

The package is currently available on GitHub here: https://github.com/Emaasit/UGvoters16

Motivation for Developing this R Package

There were several members of the media who were claiming to have found discrepancies in the Uganda Electoral Commission voter count. Their claims created a storm on social media which caught my attention.

media-claim

 

media-claim5

 

media-claim3

With the data readily available in PDF format on the website of the EC, I wanted to compile it into an R package so that others can analyze it and make their own conclusions.

How to use the Package

Before you can use the data in R, you need to download it from Github using the following commands:

install.packages("devtools")
devtools::install_git("git://github.com/emaasit/UGvoters16.git", branch = "master")
library(UGvoters16)

The package is made up of two datasets including:

  1. “UGvoters16”: This is the original data set released by the Commission. It’s made up of 14 variables and 280, 010 observations.
  2. “analyzed”: This dataset contains an extra column (“ANALYZED_VOTER_COUNT”) added by a member of the media to make their comparison.

After loading the library, you can create local data frames using the following commands:

df1 <- UGvoters16
df2 <- analyzed

## You can take a glimpse of the data by using 
## the head() function.
head(df1)
##   SER_NO DIST_CODE DISTRICT_NAME EA_CODE       EA_NAME SCTY_CODE
## 1      1        01          APAC     002 KWANIA COUNTY        01
## 2      2        01          APAC     002 KWANIA COUNTY        01
## 3      3        01          APAC     002 KWANIA COUNTY        01
## 4      4        01          APAC     002 KWANIA COUNTY        01
## 5      5        01          APAC     002 KWANIA COUNTY        01
## 6      6        01          APAC     002 KWANIA COUNTY        01
##   SCOUNTY_NAME PAR_CODE PARISH_NAME PS_CODE             PS_NAME
## 1        ADUKU       01      ADYEDA      01       ADYEDA CENTRE
## 2        ADUKU       01      ADYEDA      02 APORWEGI P.7 SCHOOL
## 3        ADUKU       01      ADYEDA      03        ADYEDA IMALO
## 4        ADUKU       02       ALIRA      01             ALIRA B
## 5        ADUKU       02       ALIRA      02             AKOT  A
## 6        ADUKU       02       ALIRA      03               OLEKE
##   NO_OF_FEMALES NO_OF_MALES EC_VOTER_COUNTS ANALYZED_VOTER_COUNT
## 1           134         143             277                  277
## 2           379         323             703                  702
## 3           164         157             322                  321
## 4           461         411             872                  872
## 5           386         364             750                  750
## 6           443         383             826                  826
head(df2)
##   SER_NO DIST_CODE DISTRICT_NAME EA_CODE       EA_NAME SCTY_CODE
## 1      1         1          APAC       2 KWANIA COUNTY         1
## 2      2         1          APAC       2 KWANIA COUNTY         1
## 3      3         1          APAC       2 KWANIA COUNTY         1
## 4      4         1          APAC       2 KWANIA COUNTY         1
## 5      5         1          APAC       2 KWANIA COUNTY         1
## 6      6         1          APAC       2 KWANIA COUNTY         1
##   SCOUNTY_NAME PAR_CODE PARISH_NAME PS_CODE             PS_NAME
## 1        ADUKU        1      ADYEDA       1       ADYEDA CENTRE
## 2        ADUKU        1      ADYEDA       2 APORWEGI P.7 SCHOOL
## 3        ADUKU        1      ADYEDA       3        ADYEDA IMALO
## 4        ADUKU        2       ALIRA       1             ALIRA B
## 5        ADUKU        2       ALIRA       2             AKOT  A
## 6        ADUKU        2       ALIRA       3               OLEKE
##   NO_OF_FEMALES NO_OF_MALES EC_VOTER_COUNTS ANALYZED_VOTER_COUNT
## 1            43          51             240                  277
## 2           312         251             687                  702
## 3            76          66             287                  321
## 4           404         349             869                  872
## 5           320         296             739                  750
## 6           384         317             819                  826
# what are the column names
names(df1)
##  [1] "SER_NO"               "DIST_CODE"            "DISTRICT_NAME"       
##  [4] "EA_CODE"              "EA_NAME"              "SCTY_CODE"           
##  [7] "SCOUNTY_NAME"         "PAR_CODE"             "PARISH_NAME"         
## [10] "PS_CODE"              "PS_NAME"              "NO_OF_FEMALES"       
## [13] "NO_OF_MALES"          "EC_VOTER_COUNTS"      "ANALYZED_VOTER_COUNT"
names(df2)
##  [1] "SER_NO"               "DIST_CODE"            "DISTRICT_NAME"       
##  [4] "EA_CODE"              "EA_NAME"              "SCTY_CODE"           
##  [7] "SCOUNTY_NAME"         "PAR_CODE"             "PARISH_NAME"         
## [10] "PS_CODE"              "PS_NAME"              "NO_OF_FEMALES"       
## [13] "NO_OF_MALES"          "EC_VOTER_COUNTS"      "ANALYZED_VOTER_COUNT"
# count the total number of analyzed voter counts
sum(df2$ANALYZED_VOTER_COUNT)
## [1] 15277197

 Closing Remarks

With this data now readily available in an R package, data analysts/data journalists can perform their own analyses with the R programming language that provides more tools and methods.

The post An R Package for the “controversial” counts of registered voters in Uganda appeared first on Data Science Africa.

To leave a comment for the author, please follow the link and comment on their blog: R – Data Science Africa.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)