Forecasting Presidential Elections

[This article was first published on "R-bloggers" via Tal Galili in Google Reader, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Because of Andrew Gelman’s strong, repeated recommendations, I’ve been reading “Forecasting Presidential Elections” by Steven J. Rosenstone for the last two days. It’s quite a remarkable book and complex enough that I’m sure I’ll return to it many times after I’ve finished it. I was particularly intrigued by a table in the first chapter noting the performance of the Gallup poll over the years. The very first Gallup polls were conducted well in advance of the elections, and it seems that Gallup thought that this had been a substantial source of error in his predictions. A review of the data in the aforementioned table, though, makes clear that there is essentially no meaningful relationship between the number of days in advance of the election when the poll was begun and the accuracy of that poll. If one assumes that the Gallup polls were accurate measures of the state of the public’s opinion on the dates when they were conducted, the data suggests that there are years when the public’s electoral decision is decided well in advance of the election and there are other years when the public’s decision is decided almost immediately before the election.

To make all of this clear, I graphed the data describing the accuracy of all of the Gallup electoral polls from 1936 to 1980 as a function of how far ahead of the election the poll was conducted. As you can see, there’s virtually no pattern: the light blue line in the chart is the least squares line and the gray line is simply a horizontal line plotted at 0. The least squares line reflects a statistically insignificant correlation of 0.20.

gallup.png

Another point that struck me while reading Rosenstone’s book was his discussion of the non-response error of polls. Non-response error is normally explained as the errors in telephone polls caused by those who don’t pick up their phones, those who refuse to be surveyed and those who simply don’t own phones. The first two problems always make sense to me, but I always find myself wondering about the third: how many Americans actually vote in presidential elections who don’t own phones? I’d love to have an answer to that question if anyone has relevant data.

To leave a comment for the author, please follow the link and comment on their blog: "R-bloggers" via Tal Galili in Google Reader.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)