Presidential Candidate Sentiment Analysis

October 7, 2012
By

(This article was first published on NERD PROJECT » R project posts, and kindly contributed to R-bloggers)

After watching the Presidential debates and hearing all the opinions on how the candidates performed, I got the hair brained idea of creating a simple function that would do automate the pulling down of tweets for each candidate, analyze the positivity or negativity of tweets, and then graph them out. This project turned out to be a lot easier than I thought even after playing the debate drinking game.

I started out reading a slide share from Jeffrey Breen on Airline sentiment analysis, from which I ended up using his score.sentiment() function with only a very minor tweak (line 6 removes foreign characters). The other thing you need are the Opinion Lexicon written by Minqing Hu and Bing Liu, which is an amazing collection of 6800 words that gauge the sentiment, and the twitteR R package.  All code can be found here.

While the Lexicon is a pretty complete collection, you will need to add political specific words.  After loading up the two files you can easily add to them.

positive.words <- scan("~/Downloads/opinion-lexicon-English/positive-words.txt", what='character',comment.char=';')
negative.words <- scan("~/Downloads/opinion-lexicon-English/negative-words.txt",           what='character',comment.char=';')

Add new positive or negative words by simply merging it with the original list, like:

negative.words<-c(negative.words, "one percent")

Once you’ve added all the words, loaded in the functions from my GitHub (.R files), and packages, all you have to do is type the following and you’re done:

data<-president()

This will give you a histogram with mean line(dotted line) and data frame of all the tweets and scores for each one.

Obviously the higher the score the more positive the tweets.

It would be really interesting to track sentiment over time (you can only pull down 1500 most recent at a time) and connect it with other variables like macro-economic indicators, poll results, and ad spending, but I just can’t devote that much time to side project.  If you add to this project let me know how it turns out.


To leave a comment for the author, please follow the link and comment on his blog: NERD PROJECT » R project posts.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.