# Voting Twice in France

February 19, 2014
By

(This article was first published on Freakonometrics » R-english, and kindly contributed to R-bloggers)

On the Monkey Cage blog, Baptiste Coulmont (a.k.a. @coulmont) recently uploaded a post entitled “You can vote twice ! The many political appeals of proxy votes in France“, coauthored with Joël Gombin (a.k.a. @joelgombin), and myself. The study was initially written in French as mentioned in a previous post. Baptiste posted additional information on his blog (http://coulmont.com/blog/…) and I also wanted to post some lines of code, to mention a model that was not used in that study (more complex to analyze, but more realistic, and with the same conclusions). The econometric study is based on aggregated voted, with a possible ecological misinterpretation.

• Regression Model: Possible Explanatory Variables

The first idea was to model proxies using a binomial regression, per pooling station $P_i\sim\mathcal{B}(N_i,p_i)$ where $P_i$ denote the number of proxy vote, per station $i$, and $N_i$ denotes the number of voters. Proportion $p_i$ can be a function of possible explanatory variables (on Baptiste’s blog there are additional information about the datasets, obtained from insee.fr and opendata.paris.fr)

> bt1=read.table("paris2007-pres-t1.csv",header=TRUE,sep=";")
> bv$BV=bv$BVCOM
> baset1=merge(bt1,bv,by="BV")
> baset2=merge(bt2,bv,by="BV")
> baset1$LOGEMENT=baset1$PROPRIO+baset1$LOCNONHLM+baset1$LOCHLM+baset1$GRATUIT > baset2$LOGEMENT=baset2$PROPRIO+baset2$LOCNONHLM+baset2$LOCHLM+baset2$GRATUIT

For instance, assume that $p_i$ is a function of the proportion of owner of the place people live in, denoted $X_i$ in the neighborhood of the pooling station,

> variable="PROPRIO"
> reference="LOGEMENT"
> baset1$taux=baset1[,variable]/baset1[,reference] > baset2$taux=baset2[,variable]/baset2[,reference]

We can consider a logistic regression

$p_i=h(X_i)=\frac{\exp[\beta_0+\beta_1 X_i]}{1+\exp[\beta_0+\beta_1 X_i]}$

or a logistic regression with splines, if we do not want to assume a linear model

$p_i=\tilde h(X_i)=\frac{\exp[s(X_i)]}{1+\exp[s(X_i)]}$

With cubic splines, the code is

> b=hist(baset1$taux,plot=FALSE) > library(splines) > regt1=glm(PROCURATIONS/INSCRITS~bs(taux,6),family=binomial,weights=INSCRITS,data=baset1) > regt2=glm(PROCURATIONS/INSCRITS~bs(taux,6),family=binomial,weights=INSCRITS,data=baset2) > u=seq(min(baset1$taux)+.015,max(baset1$taux)-.015,by=.001) > ND=data.frame(taux=u) > ug=seq(0,max(baset1$taux)+.05,by=.001)
> pt1=predict(regt1,newdata=ND,se=TRUE,type="response")
> pt2=predict(regt2,newdata=ND,se=TRUE,type="response")
> library(RColorBrewer)
> CL=brewer.pal(6, "RdBu")
> plot(ug,ug*1,col="white",xlab=nom,ylab="Taux de procuration",
+ ylim=c(0,.1))
> for(i in 1:(length(b$breaks)-1)){ + polygon(b$breaks[i+c(0,0,1,1)],c(0,b$counts[i],b$counts[i],0)
+ /max(b$counts)*.05,col="light yellow",border=NA)} > polygon(c(u,rev(u)),c(pt1$fit+2*pt1$se.fit,rev(pt1$fit-2*pt1$se.fit)), + border=NA,density=30,col=CL[4]) while a standard logistic regression would be > lines(u,pt1$fit,col=CL[6],lwd=2)
> polygon(c(u,rev(u)),c(pt2$fit+2*pt2$se.fit,rev(pt2$fit-2*pt2$se.fit)),
+ border=NA,density=30,col=CL[3])
> lines(u,pt2$fit,col=CL[1],lwd=2) > regt1l=glm(PROCURATIONS/INSCRITS~taux,family=binomial,weights=INSCRITS,data=baset1) > regt2l=glm(PROCURATIONS/INSCRITS~taux,family=binomial,weights=INSCRITS,data=baset2) > ND=data.frame(taux=ug) > pt1l=predict(regt1l,newdata=ND,se=TRUE,type="response") > pt2l=predict(regt2l,newdata=ND,se=TRUE,type="response") > lines(ug,pt1l$fit,col=CL[5],lty=2)
> lines(ug,pt2l$fit,col=CL[2],lty=2) > legend(0,.1,c("Second Tour","Premier Tour"),col=CL[c language="(1,6)"][/c], + lwd=2,lty=1,border=NA) Here it is (the confidence region is for the spline regression) with on blue the first round of the Presidential election, and in red, the second round (in France, it’s a two-round system) (the legend of the y axis is not correct). We can consider as explanatory variable the rate of H.L.M., low-cost housing or council housing, If I like the graph, unfortunately, the interpretation of coefficient $\beta_1$ might be complicated > summary(regt1l) Call: glm(formula = PROCURATIONS/INSCRITS ~ taux, family = binomial, data = baset1, weights = INSCRITS) Deviance Residuals: Min 1Q Median 3Q Max -12.9549 -1.5722 0.0319 1.6292 13.1303 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -3.70811 0.01516 -244.6 <2e-16 *** taux 1.49666 0.04012 37.3 <2e-16 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 12507 on 836 degrees of freedom Residual deviance: 11065 on 835 degrees of freedom AIC: 15699 Number of Fisher Scoring iterations: 4 > summary(regt2l) Call: glm(formula = PROCURATIONS/INSCRITS ~ taux, family = binomial, data = baset2, weights = INSCRITS) Deviance Residuals: Min 1Q Median 3Q Max -15.4872 -1.7817 -0.1615 1.6035 12.5596 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -3.24272 0.01230 -263.61 <2e-16 *** taux 1.45816 0.03266 44.65 <2e-16 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 9424.7 on 836 degrees of freedom Residual deviance: 7362.3 on 835 degrees of freedom AIC: 12531 Number of Fisher Scoring iterations: 4 So we did consider a standard linear regression model, for the proxy rate, per station, $\frac{P_i}{N_i}=\beta_0+\beta_1 X_i+\varepsilon_i$ (again, either a model with splines, or a standard linear model). The code is > regt1=lm(PROCURATIONS/INSCRITS~bs(taux,6),weights=INSCRITS,data=baset1) > regt2=lm(PROCURATIONS/INSCRITS~bs(taux,6),weights=INSCRITS,data=baset2) > u=seq(min(baset1$taux)+.015,max(baset1$taux)-.015,by=.001) > ND=data.frame(taux=u) > ug=seq(0,max(baset1$taux)+.05,by=.001)
> pt1=predict(regt1,newdata=ND,se=TRUE,type="response")
> pt2=predict(regt2,newdata=ND,se=TRUE,type="response")
> library(RColorBrewer)
> CL=brewer.pal(6, "RdBu")
> plot(ug,ug*1,col="white",xlab=nom,ylab="Taux de procuration",
+ ylim=c(0,.1))
> for(i in 1:(length(b$breaks)-1)){ + polygon(b$breaks[i+c(0,0,1,1)],c(0,b$counts[i],b$counts[i],0)
+ /max(b$counts)*.05,col="light yellow",border=NA)} > polygon(c(u,rev(u)),c(pt1$fit+2*pt1$se.fit,rev(pt1$fit-2*pt1$se.fit)), + border=NA,density=30,col=CL[4]) > lines(u,pt1$fit,col=CL[6],lwd=2)
> polygon(c(u,rev(u)),c(pt2$fit+2*pt2$se.fit,rev(pt2$fit-2*pt2$se.fit)),
+ border=NA,density=30,col=CL[3])
> lines(u,pt2$fit,col=CL[1],lwd=2) > regt1l=lm(PROCURATIONS/INSCRITS~taux,weights=INSCRITS,data=baset1) > regt2l=lm(PROCURATIONS/INSCRITS~taux,weights=INSCRITS,data=baset2) > ND=data.frame(taux=ug) > pt1l=predict(regt1l,newdata=ND,se=TRUE,type="response") > pt2l=predict(regt2l,newdata=ND,se=TRUE,type="response") > lines(ug,pt1l$fit,col=CL[5],lty=2)
> lines(ug,pt2l$fit,col=CL[2],lty=2) > legend(0,.1,c("Second Tour","Premier Tour"),col=CL[c language="(1,6)"][/c], + lwd=2,lty=1,border=NA) Here is again the evolution as a function of the rate of owner of their homes, The graph is rather close to the one before, and here, the interpretation of the summary table is more conventional, > summary(regt1l) Call: lm(formula = PROCURATIONS/INSCRITS ~ taux, data = baset1, weights = INSCRITS) Weighted Residuals: Min 1Q Median 3Q Max -1.9994 -0.2926 0.0011 0.3173 3.2072 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0.021268 0.001739 12.23 <2e-16 *** taux 0.054371 0.004812 11.30 <2e-16 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 0.646 on 835 degrees of freedom Multiple R-squared: 0.1326, Adjusted R-squared: 0.1316 F-statistic: 127.7 on 1 and 835 DF, p-value: < 2.2e-16 > summary(regt2l) Call: lm(formula = PROCURATIONS/INSCRITS ~ taux, data = baset2, weights = INSCRITS) Weighted Residuals: Min 1Q Median 3Q Max -2.9029 -0.4148 -0.0338 0.4029 3.4907 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0.033909 0.001866 18.17 <2e-16 *** taux 0.079749 0.005165 15.44 <2e-16 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 0.6934 on 835 degrees of freedom Multiple R-squared: 0.2221, Adjusted R-squared: 0.2212 F-statistic: 238.4 on 1 and 835 DF, p-value: < 2.2e-16 We have used those codes to produce the graphs mentioned in the post. But before mentioning the residuals of the multiple model we considered, I wanted to share some awesome code that produce maps (I can say that those codes are awesome since Baptiste wrote most of them). • Visualization of Residuals on a Map of Paris To plot the neighborhood of the pooling stations, one more time the post on Baptiste’s blog, explains how the shapefile was obtained from cartelec.net > library(maptools) > library(rgdal) > library(classInt) > paris=readShapeSpatial("paris-cartelec.shp") To visualize the proxy rate (the average of round one and round two), here is the code > elec=data.frame() > elec=cbind(bt1$BV,(bt1$PROCURATIONS+bt2$PROCURATIONS),(bt1$EXPRIMES+bt2$EXPRIMES))
> colnames(elec)=c("BV","PROCURATIONS","EXPRIMES")
> elec=as.data.frame(elec)
> elec$BV=bt1$BV

To get nice colors, function of the rates, we use

> m=match(paris$BUREAU,elec$BV)
> plotvar=100*elec$PROCURATIONS/elec$EXPRIMES
> nclr=7
> plotclr=brewer.pal(nclr,"RdYlBu")[nclr:1]
>(plotvar[m], nclr, style="fisher",dataPrecision=1)
> colcode=findColours(class, plotclr)

and finally

> par(mar=c(1,1,1,1))
> plot(paris,col=colcode,border=colcode)
> legend(656274.9, 6867308,legend=names(attr(colcode,"table")),
+ fill=attr(colcode, "palette"), cex=1, bty="n",
+ title="Frequence procurations (%)")

If we consider a model with three explanatory variable, to explain the proxy rate,

> regt1=lm(PROCURATIONS/INSCRITS~I(POP65P/POP)+
+ I(PROPRIO/LOGEMENT)+I(CS3/POP1564),weights=INSCRITS,data=baset1)

we can plot the residuals using

> m=match(paris$BUREAU,elec$BV)
> plotvar=100*residuals(regt1)
> nclr=7
> plotclr=brewer.pal(nclr,"RdYlBu")[nclr:1]
>(plotvar[m], nclr, style="fisher",dataPrecision=1)
> colcode=findColours(class, plotclr)
> par(mar=c(1,1,1,1))
> plot(paris,col=colcode,border=colcode)
> legend(656274.9, 6867308,legend=names(attr(colcode,"table")),
+ fill=attr(colcode, "palette"), cex=1, bty="n",title="Residus")

It might not be a pure random spatial noise… But we could not get better with our small set of covariates.