Web app for individual party vote from the 2014 New Zealand election study

(This article was first published on Peter's stats stuff - R, and kindly contributed to R-bloggers)

Last week I posted some analysis of individual voting behaviour in New Zealand’s 2014 general election. In that post, I used logistic regression in four different models to predict the probability of an individual giving party vote to each of the four largest parties – National, Labour, Green and New Zealand First. That let the user compare the people voting for each of those parties, one at a time, to the wider population.

A logical extension of this is to model party vote for those four categories, plus “other” and “did not vote”, simultaneously as a multinomial response. I tried this out with several different methods: a deep learning neural network (from H2O, random forest (trying out both the H2O version and ranger, a fast R/C++ implementation), and multinomial log-linear regression (from nnet). The aim was to produce an interactive web tool that lets people see the impact of changing one variable at a time on predicted voting probabilities:

As per last week’s approach, I use about 20 explanatory variables in total with 2,835 observations. As my purpose was predictive analytics rather than structural inference, I dealt with the survey weighting by the brute force method of creating replicates of each row with the number of rows proportionate to their calibrated survey weight (on average 10 rows each). I added some noise to the data (as extra missing values for one variable per person) in the interest of regularising the predicted probabilities and used a variant of multiple imputation to deal with the missing data.

After playing around with tuning via the very convenient h2o.grid function, the best performing model was the neural network with two hidden layers of 60 neurons each and a high dropout rate between each layer. However, this was a bit slow for the end user when implemented in Shiny for the web app, and I anticipated some further problems in deploying an H2O model to shinyapps.io – problems I’ll address at some point, but not today. So in the end I used an average of the ranger random forest and the nnet::multinom multinomial regression models, which is nice and fast and gives very plausible results.


As usual, comments, suggestions and corrections are welcomed.

To leave a comment for the author, please follow the link and comment on their blog: Peter's stats stuff - R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)