Charting the Defeat of AV using R (and some ggplot2 and merge operations on top)

[This article was first published on Psychwire » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

In this post, I’ll be graphing some results from a recent referendum held here in the UK and combining it with the results of a set of local elections that were held at the same time. I’ll give some examples of graphing stuff using ggplot2 and will also show some info regarding merging datasets.

At the outset, I want to point out that this isn’t intended to be a ‘using stats to be political’ post. I just like playing around with data. Don’t for a second assume that I’m trying to say anything meaningful here. It’s just for entertainment purposes only.

The Vote, and the Alternative Vote

We have a coalition government in the UK, between the Conservatives and the Liberal Democrats. One thing the Lib Dems have pushed for, and in a rare instance of getting their own way have actually achieved, is having a referendum on changing the voting system here. They wanted to institute Alternative Voting. The vote was cast last week. AV was crushed.

At the same time, votes were cast for the local councils. The people who voted, myself included, were handed two exciting bits of paper to scribble on at the polling station. The Lib Dems lost the most ground in the past 30 years.

Charting the defeat of AV using R

So I saw on the Guardian website that they were offering a spreadsheet of the AV results broken down by different areas in the UK. I played around with a bit, and then thought that it might be interesting to compare the AV results to the local council election results. Yes, there’s a load of correlation not implying causation from that idea, as people who voted in the AV referendum may not have necessarily voted in the local council elections, and furthermore, people who did vote in both may not have voted consistently with the party that they were supporting. In other words, many people may have voted Lib Dem, the party which favours AV the most, and then voted against AV. Still, cross-comparing the results from the referendum and the local elections should, at an overall level, give some basic indication of the feeling and political vibe in different areas. Again, remember, this is all for fun. I’m more than happy to admit that I’m not an expert (or even a novice) in political science, if that’s what you even call this whole “running stats on votes” thing that I’m doing here.

I took the spreadsheet regarding AV from the Guardian and then headed off in search for a similar spreadsheet containing local election results. The closest I could find was on the Telegraph website. This one only covers England I think. Most sites give a breakdown of the local election results in a format that isn’t easy to put into a spreadsheet (i.e., I’d have to sit here for hours cross-tabulating the ones that are missing), so I’m going with what I can get. I was surprised to find that our dear old government doesn’t retain a centralised copy of the results and put them on a website.

With two datasets in hand, one called av and one called les (Local council ElectionS), I was ready to start. I ran a merge of the two to get started:

combined_base = merge(av, les)

In both datasets, there is a column called name which is used to match everything up. As my AV dataset contains more rows than the local council elections dataset, I end up with only those areas in the AV dataset that also appeared in the local council elections dataset. This gave me 279 rows.

Next up: select only the local councils where the Conservatives, Labour or Lib Dems gained overall council control (indicated by the winner column). Then create a new column called win_label which is a textual version of the shortened names (these are C, Lab and LD) listed in winner.

combined = combined_base[combined_base$winner=="C" | combined_base$winner=="Lab" | combined_base$winner=="LD",]
combined$win_label[combined$winner=="C"] = "Conservative"
combined$win_label[combined$winner=="Lab"] = "Labour"
combined$win_label[combined$winner=="LD"] = "Liberal Democrat"

Next we can do a histogram of the number of councils where each party were victorious, compared to the proportion of the electorate in those councils who voted YES to AV:

ggplot(combined)+
aes(x=yes_perc)+
geom_histogram()+
scale_x_continuous("Percentage of YES to AV votes")+
scale_y_continuous("Number of Local Councils")+
facet_wrap(facet=~win_label)

The code gives us the following:

From the histograms, the defeat of the Lib Dems in the local elections is very clear. They hardly won anything.

Ok, so let’s take a look at it from a different angle. We have information available in the datasets regarding the percentage of people who voted in each area. Here’s the R code:

ggplot(combined)+
aes(x=yes_perc, y=turnout_perc, colour=win_label)+
geom_point(size=4)+
scale_colour_manual(values = c("blue","red", "orange")) +
scale_x_continuous("Percentage of YES to AV votes")+
scale_y_continuous("Percentage of Electorate who Voted")

Note the use of scale_colour_manual there to set each of the parties to their respective colours. I also resized the points within the geom_point command because the Lib Dem orange points were hard to see with the smaller default size.

Aside from the one rare instance where there was a high YES to AV vote and also a Lib Dem council being voted in (i.e., what would be expected), it seems there is a strong clustering towards a low proportion of YES votes.

One other point about this graph that stands out. Take a look at how the councils where Labour (red) were voted in tend to fall in areas where less of the electorate voted. When 45% or more voted, the Conservatives dominated, except for three Lib Dem wins.

Summary Stats

Finally, let’s look at some descriptive stats as a summary. Here’s the code.

ddply(combined, c("Council"), summarize,
"Yes Percentage (mean)"=mean(yes_perc),
"Turnout Percentage (mean)"=mean(turnout_perc))

And here’s the table:

To leave a comment for the author, please follow the link and comment on their blog: Psychwire » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)