What happened to six million voters?

[This article was first published on eKonometrics, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The recent elections in Pakistan on May 11 were a great success by all means. In spite of the threats for violence by Al-Qaeda and its local franchises in Pakistan against those who would vote, millions of Pakistanis indeed stepped out to vote for an elected government. The Election Commission of Pakistan (ECP) claimed a voter turnout of 60%.

One would have hoped to see 50.5 million votes polled for a 60% turnout by the 84.2 million registered voters in the 262 ridings of the National Assembly for which the ECP reported results. However, ECP’s own data reported 44.9 million votes, resulting in a gap of app. 5.7 million votes. The actual turnout thus was close to 53%.


I used R to siphon off data for 262 ridings, which ECP reported on separate web pages. The R code is presented below.


# Get the URL prefix

# loop through the 272 ridings
for (i in 1:272) {
  #get the riding number
  u2<- i
  #complete the URL Address
  #Read the table
  ridedata=readHTMLTable(url2, header=T, which=8,stringsAsFactors=F)
  #Read the HTML page
  web_page <- readLines(url2)
  # Pull out the appropriate line with the riding name using the identifier “specialheading”
  ridename <- web_page[grep("Specialheading", web_page)]
  #get the starting integer for the riding name
  startx=regexpr(“(“, ridename, fixed=TRUE)
  #get the last digit for the riding name
  endx=regexpr(“  endx=endx[1]-2
  #Generate the riding name
  #merge data in one table
  assign(paste0(“fname”,u2, sep=””), cbind(ridedata,riding=i,rname=ridename))

I used a simple rbind command to assemble data in one large file after storing  individual riding data first in separate files. This was done because the server timed out several times during the execution, and it allowed me to restart from the riding where the system failed, rather than starting from the beginning every time.

To leave a comment for the author, please follow the link and comment on their blog: eKonometrics.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)