nzelect 0.4.0 on CRAN with results from 2002 to 2014 and polls up to September 2017 by @ellis2013nz

[This article was first published on Peter's stats stuff - R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

More nzelect New Zealand election data on CRAN

Version 0.4.0 of my nzelect R package is now on CRAN. The key changes from version 0.3.0 are:

  • election results by voting place are now available back to 2002 (was just 2014)
  • polling place locations, presented as consistent values of latitude and longitude, now available back to 2008 (was just 2014)
  • voting intention polls are now complete up to the 2017 general election (previously stopped about six months ago on CRAN, although the GitHub version was always kept up to date)
  • a few minor bug fixes eg allocate_seats now takes integer arguments, not just numeric.

The definitive source of New Zealand election statistics is the Electoral Commission. If there are any discrepencies between their results and those in nzelect, it’s a bug, and please file an issue on GitHub. The voting intention polls come from Wikipedia.

Example – special and early votes

Currently, while we wait for the counting of the “special” votes from the 2017 election, there’s renewed interest in the differences between special votes and those counted on election night. Special votes are those made by anyone outside their electorate, enrolled in the month before the election, or are in one of a few other categories. Most importantly, it’s people who are on the move or who are very recently enrolled, and obviously such people are different from the run of the mill voter.

Here’s a graphic created with the nzelect R package that shows how the “special votes” in the past have been disproportionately important for Greens and Labour:

Here’s the code to create that graphic:

# get the latest version from CRAN:
install.packages("nzelect")

library(nzelect)
library(tidyverse)
library(ggplot2)
library(scales)
library(ggrepel)
library(forcats)

# palette of colours for the next couple of charts: 
palette <- c(parties_v, Other = "pink2", `Informal Votes` = "grey")

# special votes:
nzge %>%
  filter(voting_type == "Party") %>%
  mutate(party = fct_lump(party, 5)) %>%
  mutate(dummy = grepl("special", voting_place, ignore.case = TRUE)) %>%
  group_by(electorate, party, election_year) %>%
  summarise(prop_before = sum(votes[dummy]) / sum(votes),
            total_votes = sum(votes)) %>%
  ungroup() %>%
  mutate(party = gsub(" Party", "", party),
         party = gsub("ACT New Zealand", "ACT", party),
         party = gsub("New Zealand First", "NZ First", party)) %>%
  mutate(party = fct_reorder(party, prop_before)) %>%
  ggplot(aes(x = prop_before, y = party, size = total_votes, colour = party)) +
  facet_wrap(~election_year) +
  geom_point(alpha = 0.1) +
  ggtitle("'Special' votes proportion by party, electorate and year",
          "Each point represents the proportion of a party's vote in each electorate that came from special votes") +
  labs(caption = "Source: www.electionresults.govt.nz, collated in the nzelect R package",
       y = "") +
  scale_size_area("Total party votes", label = comma) +
  scale_x_continuous("\nPercentage of party's votes that were 'special'", label = percent) +
  scale_colour_manual(values = palette, guide = FALSE)

Special votes are sometimes confused with advance voting in general. While many special votes are advance votes, the relationship is far from one to one. We see this particularly acutely by comparing the previous graphic to one that is identical except that it identifies all advance votes (those with the phrase “BEFORE” in the Electoral Commission’s description of polling place):

While New Zealand First are the party that gains least proportionately from special votes, they gain the most from advance votes, although the difference between parties is fairly marginal. New Zealand First voters are noticeably more likely to be in an older age bracket than the voters for other parties. My speculation on their disproportionate share of advance voting is that it is related to that, although I’m not an expert in that area and am interested in alternative views.

This second graphic also shows nicely just how much the advance voting is becoming a feature of the electoral landscape. Unlike the proportion of votes that are “special” which has been fairly stable, the proportion of votes that are case in advance has increased very substantially over the past decade, and increased further in the 2017 election (for which final results come out on Saturday).

Here’s the code for the second graphic; it’s basically the same as the previous chunk of code, except filtering on a different character string in the voting place name:

nzge %>%
  filter(voting_type == "Party") %>%
  mutate(party = fct_lump(party, 5)) %>%
  mutate(dummy = grepl("before", voting_place, ignore.case = TRUE)) %>%
  group_by(electorate, party, election_year) %>%
  summarise(prop_before = sum(votes[dummy]) / sum(votes),
            total_votes = sum(votes)) %>%
  ungroup() %>%
  mutate(party = gsub(" Party", "", party),
         party = gsub("ACT New Zealand", "ACT", party),
         party = gsub("New Zealand First", "NZ First", party)) %>%
  mutate(party = fct_reorder(party, prop_before)) %>%
  ggplot(aes(x = prop_before, y = party, size = total_votes, colour = party)) +
  facet_wrap(~election_year) +
  geom_point(alpha = 0.1) +
  ggtitle("'Before' votes proportion by party, electorate and year",
          "Each point represents the proportion of a party's vote in each electorate that were cast before election day") +
  labs(caption = "Source: www.electionresults.govt.nz, collated in the nzelect R package",
       y = "") +
  scale_size_area("Total party votes", label = comma) +
  scale_x_continuous("\nPercentage of party's votes that were before election day", label = percent) +
  scale_colour_manual(values = palette, guide = FALSE)

‘Party’ compared to ‘Candidate’ vote

Looking for something else to showcase, I thought it might be interesting to pool all five elections for which I have the detailed results and compare the party vote (ie the proportional representation choice out of the two votes New Zealanders get) to the candidate vote (ie the representative member choice). Here’s a graphic that does just that:

We see that New Zealand First and the Greens are the two parties that are most noticeably above the diagonal line indicating equality between party and candidate votes. This isn’t a surprise – these are minority parties that appeal to (different) demographic and issues-based communities that are dispersed across the country, and generally have little chance of winning individual electorates. Hence the practice of voters is often to split their votes. This is all perfectly fine and is exactly how mixed-member proportional voting systems are meant to work.

Here’s the code that produced the scatter plot:

nzge %>%
  group_by(voting_type, party) %>%
  summarise(votes = sum(votes)) %>%
  spread(voting_type, votes) %>%
  ggplot(aes(x = Candidate, y = Party, label = party)) +
  geom_abline(intercept = 0, slope = 1, colour = "grey50") +
  geom_point() +
  geom_text_repel(colour = "steelblue") +
  scale_x_log10("Total 'candidate' votes", label = comma, breaks = c(1, 10, 100, 1000) * 1000) +
  scale_y_log10("Total 'party' votes", label = comma, breaks = c(1, 10, 100, 1000) * 1000) +
  ggtitle("Lots of political parties: total votes from 2002 to 2014",
          "New Zealand general elections") +
  labs(caption = "Source: www.electionresults.govt.nz, collated in the nzelect R package") +
  coord_equal()

What next?

Obviously, the next big thing for nzelect is to get the 2017 results in once they are announced on Saturday. I should be able to do this for the GitHub version by early next week. I would have delayed the CRAN release until 2017 results were available, but unfortunately I had a bug in some of the examples in my helpfiles that stopped them working after the 23 September 2017, so I had to rush a fix and the latest enhancements to CRAN to avoid problems with the CRAN maintainers (which I fully endorse and thank by the way).

My other plans for nzelect over the next months to a year include:

  • reliable point locations for voting places back to 2002
  • identify consistent/duplicate voting places over time to make it easier to analyse comparative change by micro location
  • add detailed election results for the 1999 and 1996 elections (these are saved under a different naming convention to those from 2002 onwards, which is why they need a bit more work)
  • add high level election results for prior to 1994

The source code for cleaning the election data and packaging it into nzelect is on GitHub. The package itself is on CRAN and installable in the usual way.

To leave a comment for the author, please follow the link and comment on their blog: Peter's stats stuff - R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)