What happened to six million voters?

May 22, 2013
By

(This article was first published on eKonometrics, and kindly contributed to R-bloggers)

The recent elections in Pakistan on May 11 were a great success by all means. In spite of the threats for violence by Al-Qaeda and its local franchises in Pakistan against those who would vote, millions of Pakistanis indeed stepped out to vote for an elected government. The Election Commission of Pakistan (ECP) claimed a voter turnout of 60%.

One would have hoped to see 50.5 million votes polled for a 60% turnout by the 84.2 million registered voters in the 262 ridings of the National Assembly for which the ECP reported results. However, ECP’s own data reported 44.9 million votes, resulting in a gap of app. 5.7 million votes. The actual turnout thus was close to 53%.

image

I used R to siphon off data for 262 ridings, which ECP reported on separate web pages. The R code is presented below.

library(XML)

# Get the URL prefix
u1<-"http://www.ecp.gov.pk/electionresult/Search.aspx?constituency=NA&constituencyid=NA-"

# loop through the 272 ridings
for (i in 1:272) {
 
  #get the riding number
  u2<- i
 
  #complete the URL Address
  url2=paste(u1,u2,sep="")
 
  #Read the table
  ridedata=readHTMLTable(url2, header=T, which=8,stringsAsFactors=F)
 
  #Read the HTML page
  web_page <- readLines(url2)
 
  # Pull out the appropriate line with the riding name using the identifier "specialheading"
  ridename <- web_page[grep("Specialheading", web_page)]
 
  #get the starting integer for the riding name
  startx=regexpr("(", ridename, fixed=TRUE)
  startx=startx[1]+1
 
  #get the last digit for the riding name
  endx=regexpr("<span", ridename)
  endx=endx[1]-2
 
  #Generate the riding name
  ridename=substr(ridename,startx,endx)
 
  #merge data in one table
  assign(paste0("fname",u2, sep=""), cbind(ridedata,riding=i,rname=ridename))
}

I used a simple rbind command to assemble data in one large file after storing  individual riding data first in separate files. This was done because the server timed out several times during the execution, and it allowed me to restart from the riding where the system failed, rather than starting from the beginning every time.

To leave a comment for the author, please follow the link and comment on his blog: eKonometrics.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.