New data and functions in nzelect 0.3.0 R package

[This article was first published on Peter's stats stuff - R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Polling data and other goodies ready for download

A new version, 0.3.0, of the nzelect R package is now available on CRAN.

  • historical polling data from 2002 to February 2017, sourced from Wikipedia
  • some small functions to help convert voting numbers into seats in a New Zealand or similar proportional representation system; and to weight polling numbers in the way done by two of New Zealand’s polling aggregator websites.
  • some small bits of metadata on political parties, most notably named vectors of their colours for use in graphics.

nzelect was originally developed in response to Ari Lamstein’s R election analysis contest, and my series of blog posts drawing on its data won me that competition. The first major version included polling-place election results from the 2014 election. The main purpose of the package remains to make these data available in tidier, more analysis-ready format than the Electoral Commission’s official election results site.

I still have plans to add the results from earlier elections, but with a New Zealand general election now scheduled for 23 September 2017 I thought I’d prioritise some polling data. I’m hoping to up the level of sophistication of at least a corner of the debate in the leadup to the election. I’ve got some blog posts on things like house effects (eg historical biases of different polling firms when confronted with actual election results) and probabilistic prediction coming up, and needed a clean and tidy set of historical data to do this.

Historical polling data

The polling data was the main addition to this version of nzelect. The data have been scraped from a range of Wikipedia pages and subjected to some cleaning. With this somewhat sketchy provenance, they can’t be guaranteed but they look very plausible. All the data have been combined in a single data frame polls which looks like this:

           Pollster      WikipediaDates  StartDate    EndDate    MidDate         Party VotingIntention   Client ElectionYear
5965 Colmar Brunton 11–15 February 2017 2017-02-11 2017-02-15 2017-02-13 United Future            0.00 One News         2017
5966 Colmar Brunton 11–15 February 2017 2017-02-11 2017-02-15 2017-02-13         Maori            0.01 One News         2017
5967 Colmar Brunton 11–15 February 2017 2017-02-11 2017-02-15 2017-02-13       Destiny              NA One News         2017
5968 Colmar Brunton 11–15 February 2017 2017-02-11 2017-02-15 2017-02-13   Progressive              NA One News         2017
5969 Colmar Brunton 11–15 February 2017 2017-02-11 2017-02-15 2017-02-13          Mana            0.01 One News         2017
5970 Colmar Brunton 11–15 February 2017 2017-02-11 2017-02-15 2017-02-13  Conservative            0.00 One News         2017

Election results (actual total party vote by party) are also included for convenience. This doesn’t remove the need to bring in the detailed historical election results by polling place in future versions of nzelect, but gives a start on the historical perspective.

Combining multiple election cycles of polling data together makes it possible to see the longer game in party political change in New Zealand:

Here’s the code behind that graphic.

library(tidyverse)
library(scales)
library(nzelect)
library(forcats)
library(stringr)
library(rvest)
library(magrittr)

#=========polls demo=========
election_dates <- polls %>%
   filter(Pollster == "Election result") %>%
   select(MidDate) %>%
   distinct()

polls %>%
   filter(!Party %in% c("Destiny", "Progressive")) %>%
   mutate(Party = gsub("M.ori", "Maori", Party)) %>%
   mutate(Party = fct_reorder(Party, VotingIntention, .desc = TRUE),
          Pollster = fct_relevel(Pollster, "Election result")) %>%
   ggplot(aes(x = MidDate, y = VotingIntention, colour = Pollster)) +
   geom_line(alpha = 0.5) +
   geom_point(size = 0.7) +
   geom_smooth(aes(group = Party), se = FALSE, colour = "grey15", span = .20) +
   scale_y_continuous("Voting intention", label = percent) +
   scale_x_date("") +
   facet_wrap(~Party, scales = "free_y") +
   geom_vline(xintercept = as.numeric(election_dates$MidDate), colour = "grey80") +
   ggtitle("15 years of voting intention opinion polls in New Zealand") +
   labs(caption = "Source: nzelect #Rstats package on CRAN")

The CRAN version of an R package isn’t the appropriate way to make day to day updates available – upgrades on CRAN should only be every three months or so at the most. I will probably keep the GitHub version up to date as more polling data comes in, but I’m not in the position to give a service level commitment on timeliness.

Converting voting results to seats

One thing we need to be able to do efficiently if we’re going to facilitate polling punditry is convert election results – real or hypothetical – into actual seats in Parliament. Since 1996, New Zealand has 120 or more seats in its single house of parliament to be allocated by a system known as “mixed-member proportional”. Each elector has two votes – an electorate vote and a party vote. 71 of the seats (at the time of writing) are “first past the post” electorates. However, these electorate votes have very little (not quite none) impact on the total make-up of Parliament, because the remaining seats are allocated to “lists” provided by the parties in such a way as to be in proportion to the electors’ party votes.

Only parties that received five percent of the party vote, or that have won at least one electorate, are counted in the proportional representation part.

The reason why there are 120 “or more” seats is the same as the reason why electorate votes have not quite exactly zero impact on the overall makeup. If a party wins more electorates than they would be entitled to from their party vote, the difference is translated into “overhang seats”. After the 2014 election there was one such seat; the leader of the United Future party won the Ōhariu electorate, but the party received only 0.22% of the party vote (much less than 1/120th), so he holds the 121st seat in Parliament.

The method of allocating the list seats is known as the Sainte-Laguë or Webster/Sainte-Laguë method. It’s well explained on Wikipedia. It’s now available in nzelect via the allocate_seats() function.

allocate_seats defaults to New Zealand settings (5% threshold, 120 seats if no overhang seats) but these can be set to other values for use in other electoral systems or conducting thought experiments about New Zealand. For example, the 5% threshhold acts as a barrier to small parties getting representation in Parliament proportionate to their support. A 2012 review recommended amongst other things reducing this to 4%, although this wasn’t adopted by Parliament. Other countries with similar systems have lower thresholds; for example, Israel has had no less than four different threshold figures in the past thirty years (1%, 1.5%, 2%, and the current value of 3.25%).

Here’s how the New Zealand Parliament would have looked with different values of the threshold for getting access to the proportional representation part of the system:

We can see that the Conservative party were the big losers from the 5% threshold rule; with 3.97% of the party vote and no seats in Parliament under current rules (or indeed the 4% threshold proposed and rejected in 2012), but five seats if using the Israeli threshold.

Using electoral results data and functions from the nzelect and other packages, here’s how that analysis was done:

#================2014 theshold scenarios=========
# Party vote results from the 2014 election
results <- GE2014 %>%
   mutate(VotingType = paste0(VotingType, "Vote")) %>%
   group_by(Party, VotingType) %>%
   summarise(Votes = sum(Votes, na.rm = TRUE)) %>%
   spread(VotingType, Votes) %>%
   select(Party, PartyVote) %>%
   ungroup() %>%
   filter(!is.na(PartyVote)) %>%
   filter(Party != "Informal Party Votes") %>%
   arrange(desc(PartyVote)) %>%
   mutate(Party = gsub(" Party", "", Party),
          Party = gsub("New Zealand First", "NZ First", Party),
          Party = gsub("Internet MANA", "Mana", Party)) %>%
   left_join(parties_df[, c("Shortname", "Colour")], by = c("Party" = "Shortname"))

# Electorate seats from the 2014 election
electorate = c(41, 27, 0, 
               0, 0, 0, 
               1, 1, 0,
               1, 0, 0,
               0,0,0)

# Seats awarded under different scenarios:
res2014 <- data_frame(
   party = results$Party,
   `5% threshold\n(actual result)` = 
      allocate_seats(results$PartyVote, electorate = electorate)$seats_v,
   `3.25% threshold like Israel\n(hypothetical result)` = 
      allocate_seats(results$PartyVote, electorate = electorate, threshold = 0.0325)$seats_v,
   `No threshold\n(hypothetical result)` = 
      allocate_seats(results$PartyVote, electorate = electorate, threshold = 0)$seats_v
) %>%
   gather(scenario, seats, -party) %>%
   mutate(party = str_wrap(party, 15),
          party = fct_reorder(party, -seats, fun = mean),
          scenario = fct_relevel(scenario, "5% threshold\n(actual result)"),
          party = fct_other(party, keep = levels(party)[1:10])) %>%
   group_by(party, scenario) %>%
   summarise(seats = sum(seats)) 
   
res2014 %>%
   ggplot(aes(x = seats, y = scenario, label = seats))  +
   facet_wrap(~party, ncol = 4) +
   geom_segment(xend = 0, aes(yend = scenario, colour = party), size = 3, alpha = 0.3) +
   scale_colour_manual(values = parties_v) +
   scale_y_discrete("", limits =  levels(res2014$scenario)[3:1]) +
   geom_text(size = 3) +
   theme(legend.position = "none") +
   ggtitle("Impact of changing minimum threshold for seats in Parliament",
           "New Zealand 2014 election seat allocation") +
   labs(x = "Seats", 
        caption = "Analysis using the `nzelect` R package")

For comparison, here are the same scenarios with the 2011 election results:

Because the full 2011 election results aren’t yet available in nzelect, the code below needs to scrape them from a HTML table on the Electoral Commission’s site, using the very user-friendly rvest R package:

#================2011 theshold scenarios=========
res2011 <- "http://archive.electionresults.govt.nz/electionresults_2011/partystatus.html" %>%
   read_html() %>%
   # extract all the tables:
   html_nodes(css = "table") %>%
   # we only want the fourth table:
   extract2(4) %>%
   # turn it into a data frame:
   html_table() %>%
   # drop the first four rows:
   slice(-(1:4))

# give it some real names:
names(res2011) <- c("Party", "PartyVote", "PropVote", "ElectorateSeats", "ListSeats", "TotalSeats")

res2011 <- res2011 %>%
   # turn into numbers:
   mutate(PartyVote = as.numeric(gsub(",", "", PartyVote)),
          ElectorateSeats = as.numeric(ElectorateSeats)) %>%
   # remove the "total seats" row:
   filter(!is.na(PartyVote)) %>%
   # estimate seats
   mutate(
      `5% threshold\n(actual result)` = 
         allocate_seats(PartyVote, electorate = ElectorateSeats)$seats_v,
      `3.25% threshold like Israel\n(hypothetical result)` = 
         allocate_seats(PartyVote, electorate = ElectorateSeats, threshold = 0.0325)$seats_v,
      `No threshold\n(hypothetical result)` = 
         allocate_seats(PartyVote, electorate = ElectorateSeats, threshold = 0)$seats_v
   )

res2011 %>%
   select(-(PartyVote:TotalSeats)) %>%
   gather(scenario, seats, -Party) %>%
   mutate(Party = gsub(" Party", "", Party),
          Party = gsub("New Zealand First", "NZ First", Party)) %>%
   left_join(parties_df[, c("Shortname", "Colour")], by = c("Party" = "Shortname")) %>%
   mutate(Party = fct_reorder(Party, -seats),
          Party = fct_other(Party, keep = levels(Party)[1:11])) %>%
   ggplot(aes(x = seats, y = scenario, label = seats))  +
   facet_wrap(~Party, ncol = 4) +
   geom_segment(xend = 0, aes(yend = scenario, colour = Party), size = 3, alpha = 0.3) +
   scale_colour_manual(values = parties_v) +
   scale_y_discrete("", limits =  levels(res2014$scenario)[3:1]) +
   geom_text(size = 3) +
   theme(legend.position = "none") +
   ggtitle("Impact of changing minimum threshold for seats in Parliament",
           "New Zealand 2011 election seat allocation") +
   labs(x = "Seats", y = "",
        caption = "Analysis using the `nzelect` R package")

New Zealand has a Westminster-like rather than USA-like relation of parliament to the executive, in that the government needs to command a majority in Parliament (indicated in budget or confidence votes) or resign. In practice, this generally leads to government by a coalition of parties, although this is not inevitable; in the 2014 election the National Party came within a whisker of being able to govern by itself (although political logic suggests they would have kept at least some junior coalition parties anyway).

nzelect is available for installation from CRAN the usual way (install.packages("nzelect")). Report any bugs, enhancement requests or issues on GitHub please. I already know about the wrong vignette title on CRAN :(… will fix it next release.

CRAN version CRAN RStudio mirror downloads

To leave a comment for the author, please follow the link and comment on their blog: Peter's stats stuff - R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)