Presidential Candidate Sentiment Analysis

[This article was first published on NERD PROJECT » R project posts, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

After watching the Presidential debates and hearing all the opinions on how the candidates performed, I got the hair brained idea of creating a simple function that would do automate the pulling down of tweets for each candidate, analyze the positivity or negativity of tweets, and then graph them out. This project turned out to be a lot easier than I thought even after playing the debate drinking game.

I started out reading a slide share from Jeffrey Breen on Airline sentiment analysis, from which I ended up using his score.sentiment() function with only a very minor tweak (line 6 removes foreign characters). The other thing you need are the Opinion Lexicon written by Minqing Hu and Bing Liu, which is an amazing collection of 6800 words that gauge the sentiment, and the twitteR R package.  All code can be found here.

While the Lexicon is a pretty complete collection, you will need to add political specific words.  After loading up the two files you can easily add to them.

positive.words <- scan("~/Downloads/opinion-lexicon-English/positive-words.txt", what='character',comment.char=';') negative.words <- scan("~/Downloads/opinion-lexicon-English/negative-words.txt",           what='character',comment.char=';')

Add new positive or negative words by simply merging it with the original list, like:

negative.words<-c(negative.words, "one percent")

Once you’ve added all the words, loaded in the functions from my GitHub (.R files), and packages, all you have to do is type the following and you’re done:


This will give you a histogram with mean line(dotted line) and data frame of all the tweets and scores for each one.

Obviously the higher the score the more positive the tweets.

It would be really interesting to track sentiment over time (you can only pull down 1500 most recent at a time) and connect it with other variables like macro-economic indicators, poll results, and ad spending, but I just can’t devote that much time to side project.  If you add to this project let me know how it turns out.

To leave a comment for the author, please follow the link and comment on their blog: NERD PROJECT » R project posts. offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)