#PredictiveCOL – Forecasting Colombia’s peace plebiscite (final update)

[This article was first published on Data Literacy - The blog of Andrés Gutiérrez, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

For sure, this is the more exciting forecast I have ever done. On one hand, I am Colombian guy, and I really want to live in a peaceful country, and I do want a better place for raising my children. On the other hand, I am very serious when it comes to forecasting.

Maybe you have read this blog before, and you have realized that I love predictions cuz, as a data scientist, you can use your own set of statistical methodologies that relates to voting intention and turnout. If you think genuinely, everything counts while forecasting this kind of processes. For example, you may weekly track people’s security perception, how much the president is disliked, how many minutes newscast spend about the peace process, how many articles or opinion columns are written, how many tweets are sent per week about FARC, everything, everything counts and matter. Let us name these variables as contextual variables

Of course, polls and pollsters are also important. They are the only proxy of voting intention. However, polls get old, and they should be updated with new data; also some pollsters are not as confident as others (I really do not believe everything some pollsters claim), and sample size also matters cuz it improves the sampling error. 

Remember that this plebiscite has two options for Colombians to choose. The first option is Yes, that goes for Yes, I approve the agreement between Farc and government. The second option is No, meaning No, I don’t support that agreement. With those topics in mind, let me introduce you my (bayesian) predictions for the Colombian plebiscite. Firstly, let me exhibit the trend that the two options have shown over time:
As you can see, we have deflated the data by undecided voters. Note that we extracted the signal (bold lines) from the noise generated by polls. The following graph shows the less robust prediction based only on what polls have estimated.

Now, a more reasonable forecast based on prior information (2014 presidential elections and 2014 legislative elections) that tries to explain the outcomes of the polls by modeling the extracted signal (see trends in previous graphic) with contextual variables. After the model is fitted, we use a Bayesian setup that relates the prior information with the estimated response from this model. The following chart shows this forecast.
Ok, it is clear that Colombian people will support this peace process. However, this election is legitimate if and only if Yes voter turnout is greater than 4.4 million. By using a similar Bayesian methodology, next graph shows the predicted turnout.
Also, we used a small area modeling to forecast the response of every Colombian department (equivalent to a state in the US). The following map shows that the majority of departments are supporting the agreement between FARC and government. However, there are some of them that will vote No. Dark areas are not supporting the deal, while light-gray areas will do support the agreement.
Finally, the posterior probability that Yes defeats No is 98.8%.

To leave a comment for the author, please follow the link and comment on their blog: Data Literacy - The blog of Andrés Gutiérrez.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)