informality, the 2010 edition

[This article was first published on simon jackman's blog » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

For the last two cycles I’ve done some simple regression analysis of the informal vote.  I saw Possum have his go at it, using a specification that is virtually the same as what I’ve run in the past (2007, 2004).

The 2010 edition follows.  As usual, electorate-level informality in House of Reps voting increases with (a) the number of candidates on the ballot; (b) the percentage of the electorate residing in non-English-speaking households (NESH); (c) does the state have optional preferential voting in their state legislative elections (NSW & QLD); but decreases with (d) percentage of the electorate with tertiary qualifications.

The basic linear spec gets you:

Coefficients:
             Estimate Std. Error t value Pr(>|t|)
(Intercept)  4.196338   0.304983  13.759  < 2e-16 ***
OPTRUE       1.618110   0.146803  11.022  < 2e-16 ***
neshp        0.104462   0.005328  19.606  < 2e-16 ***
Nominations  0.126415   0.045005   2.809  0.00566 **
unip        -0.139205   0.010577 -13.162  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

Residual standard error: 0.8898 on 145 degrees of freedom
Multiple R-squared: 0.7977,	Adjusted R-squared: 0.7921

Not too shabby for 4 linear, additive predictors.

You can do a little better with semi-parametric terms (thin-plate smoothing splines, via the mgcv package in R) in the NESH and tertiary predictors, and an interaction with OP/non-OP:

Formula:
InformalPercent ~ OPf + s(neshp, by = OPf) + s(unip, by = OPf) +
    Nominations

Parametric coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)  4.11470    0.19677   20.91  < 2e-16 ***
OPfTRUE      1.62839    0.11146   14.61  < 2e-16 ***
Nominations  0.11768    0.03362    3.50 0.000636 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

Approximate significance of smooth terms:
                    edf Ref.df     F  p-value
s(neshp):OPfFALSE 2.554  3.200 23.78 7.17e-13 ***
s(neshp):OPfTRUE  7.515  8.424 95.63  < 2e-16 ***
s(unip):OPfFALSE  3.057  3.769 26.34 1.81e-15 ***
s(unip):OPfTRUE   2.558  3.170 50.91  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

R-sq.(adj) =  0.895   Deviance explained = 90.8%
GCV score = 0.45629  Scale est. = 0.39945   n = 150

Update: by request, the four smooth terms from the GAM.
PDF

To leave a comment for the author, please follow the link and comment on their blog: simon jackman's blog » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)