**simon jackman's blog » R**, and kindly contributed to R-bloggers)

For the last two cycles I’ve done some simple regression analysis of the informal vote. I saw Possum have his go at it, using a specification that is virtually the same as what I’ve run in the past (2007, 2004).

The 2010 edition follows. As usual, electorate-level informality in House of Reps voting increases with (a) the number of candidates on the ballot; (b) the percentage of the electorate residing in non-English-speaking households (NESH); (c) does the state have optional preferential voting in their state legislative elections (NSW & QLD); but decreases with (d) percentage of the electorate with tertiary qualifications.

The basic linear spec gets you:

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 4.196338 0.304983 13.759 < 2e-16 *** OPTRUE 1.618110 0.146803 11.022 < 2e-16 *** neshp 0.104462 0.005328 19.606 < 2e-16 *** Nominations 0.126415 0.045005 2.809 0.00566 ** unip -0.139205 0.010577 -13.162 < 2e-16 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 0.8898 on 145 degrees of freedom Multiple R-squared: 0.7977, Adjusted R-squared: 0.7921

Not too shabby for 4 linear, additive predictors.

You can do a little better with semi-parametric terms (thin-plate smoothing splines, via the mgcv package in R) in the NESH and tertiary predictors, and an interaction with OP/non-OP:

Formula: InformalPercent ~ OPf + s(neshp, by = OPf) + s(unip, by = OPf) + Nominations Parametric coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 4.11470 0.19677 20.91 < 2e-16 *** OPfTRUE 1.62839 0.11146 14.61 < 2e-16 *** Nominations 0.11768 0.03362 3.50 0.000636 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Approximate significance of smooth terms: edf Ref.df F p-value s(neshp):OPfFALSE 2.554 3.200 23.78 7.17e-13 *** s(neshp):OPfTRUE 7.515 8.424 95.63 < 2e-16 *** s(unip):OPfFALSE 3.057 3.769 26.34 1.81e-15 *** s(unip):OPfTRUE 2.558 3.170 50.91 < 2e-16 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 R-sq.(adj) = 0.895 Deviance explained = 90.8% GCV score = 0.45629 Scale est. = 0.39945 n = 150

Update: by request, the four smooth terms from the GAM.

PDF

**leave a comment**for the author, please follow the link and comment on his blog:

**simon jackman's blog » R**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...