# informality, the 2010 edition

August 24, 2010
By

(This article was first published on simon jackman's blog » R, and kindly contributed to R-bloggers)

For the last two cycles I’ve done some simple regression analysis of the informal vote.  I saw Possum have his go at it, using a specification that is virtually the same as what I’ve run in the past (2007, 2004).

The 2010 edition follows.  As usual, electorate-level informality in House of Reps voting increases with (a) the number of candidates on the ballot; (b) the percentage of the electorate residing in non-English-speaking households (NESH); (c) does the state have optional preferential voting in their state legislative elections (NSW & QLD); but decreases with (d) percentage of the electorate with tertiary qualifications.

The basic linear spec gets you:

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)  4.196338   0.304983  13.759  < 2e-16 ***
OPTRUE       1.618110   0.146803  11.022  < 2e-16 ***
neshp        0.104462   0.005328  19.606  < 2e-16 ***
Nominations  0.126415   0.045005   2.809  0.00566 **
unip        -0.139205   0.010577 -13.162  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.8898 on 145 degrees of freedom
Multiple R-squared: 0.7977,	Adjusted R-squared: 0.7921


Not too shabby for 4 linear, additive predictors.

You can do a little better with semi-parametric terms (thin-plate smoothing splines, via the mgcv package in R) in the NESH and tertiary predictors, and an interaction with OP/non-OP:

Formula:
InformalPercent ~ OPf + s(neshp, by = OPf) + s(unip, by = OPf) +
Nominations

Parametric coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)  4.11470    0.19677   20.91  < 2e-16 ***
OPfTRUE      1.62839    0.11146   14.61  < 2e-16 ***
Nominations  0.11768    0.03362    3.50 0.000636 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Approximate significance of smooth terms:
edf Ref.df     F  p-value
s(neshp):OPfFALSE 2.554  3.200 23.78 7.17e-13 ***
s(neshp):OPfTRUE  7.515  8.424 95.63  < 2e-16 ***
s(unip):OPfFALSE  3.057  3.769 26.34 1.81e-15 ***
s(unip):OPfTRUE   2.558  3.170 50.91  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

R-sq.(adj) =  0.895   Deviance explained = 90.8%
GCV score = 0.45629  Scale est. = 0.39945   n = 150


Update: by request, the four smooth terms from the GAM.
PDF

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...