After watching the Presidential debates and hearing all the opinions on how the candidates performed, I got the hair brained idea of creating a simple function that would do automate the pulling down of tweets for each candidate, analyze the positivity or negativity of tweets, and then graph them out. This project turned out to be a lot easier than I thought even after playing the debate drinking game.
I started out reading a slide share from Jeffrey Breen on Airline sentiment analysis, from which I ended up using his score.sentiment() function with only a very minor tweak (line 6 removes foreign characters). The other thing you need are the Opinion Lexicon written by Minqing Hu and Bing Liu, which is an amazing collection of 6800 words that gauge the sentiment, and the twitteR R package. All code can be found here.
While the Lexicon is a pretty complete collection, you will need to add political specific words. After loading up the two files you can easily add to them.
positive.words <- scan("~/Downloads/opinion-lexicon-English/positive-words.txt", what='character',comment.char=';')
negative.words <- scan("~/Downloads/opinion-lexicon-English/negative-words.txt", what='character',comment.char=';')
Add new positive or negative words by simply merging it with the original list, like:
negative.words<-c(negative.words, "one percent")
Once you’ve added all the words, loaded in the functions from my GitHub (.R files), and packages, all you have to do is type the following and you’re done:
Obviously the higher the score the more positive the tweets.
It would be really interesting to track sentiment over time (you can only pull down 1500 most recent at a time) and connect it with other variables like macro-economic indicators, poll results, and ad spending, but I just can’t devote that much time to side project. If you add to this project let me know how it turns out.