[This article was first published on PirateGrunt, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
I saved the data from the last post which shows the percentage of Republican voters in each county. In addition to that column, I also have figures from the 2010 census. This will show things like age, ethnicity, urbanization and home ownership. Those census figures show actual population counts, so they’ll need to be altered to relative numbers to be used in any statistical inference. This will necessitate a read through the obscure column names in the data frame. The USCensus package documents this well.
I’ll note two things about the ethnic categories: 1) in pretty much every society on earth, race is a very sensitive, divisive issue with a great deal of history. I’ll add the hopelessly needless caveat that although it may be used in a statistical model, that shouldn’t suggest that ethnicity connotes any constraints around a person’s behavior or ability. 2) Perhaps in conjunction with point 1, the US Census has a very dense set of data collection for race. I’m not going to try to sort through all of the nuance that’s captured in the data, but will simply create one data element to capture the percentage of the population which identifies as white, as described in one of the several categories where it is possible to do so.
Everybody cool? Good, let’s do some math.
Unfortunately, the urbanization column isn’t available for this data. That’s a shame as I would imagine that it’s very predictive. Later, I’ll try to find it elsewhere, or create a proxy variable by computing a population density value.
The plots would suggest that counties with a large population of rentals are less apt to vote Republican. However, both the sign of the relationship and its significance aren’t what we’d expect when we include all variables. I’m going to change the column a bit, so that it’s percentage of owned homes and drop a few of the insignificant variables and try the fit again.
Ownership continues to show up as insignificant, which is just odd. One final fit with only that variable.
OK. On its own it’s fine, but it gets lost when mixed with the other variables.
What does all of this mean? It means that- for this set of explanatory variables and construction of data- absent any significant demographic shifts we can probably expect North Carolina to remain red. An influx of non-white residents, or younger residents could alter that. I’ll emphasize that this is a very superficial treatment of complex phenomena. In a later post, I’ll augment the basic census data with other data elements. Further, I’ll try to fetch data for other states to see how the relationships observed here play out elsewhere in the country.
This is also the part where I point out that Nate Silver and Andrew Gelman- two people who are reliably smarter than I am- have written about political forescasting in a way that I can’t hope to replicate. I’ve read their stuff and it’s tremensous. You should do the same.
To leave a comment for the author, please follow the link and comment on their blog: PirateGrunt.