The Budget Compromise: Mining Tweets
[This article was first published on Econometric Sense, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
After a week as SAS Gobal Forum, I’ve been pretty excited about some of the text mining presentations that I got to see. I couldn’t wait to get back to work to at least try something. After getting home I found a tweet from @imusicmash sharing a post from the Heuristic Andrew blog that shared text mining code from R. I thought I’d use that code to mine tweets related to the budget compromise/government shutdown. I searched two hashtags, #budget and #teaparty. I originally wanted to see if I could find out either what teaparty supporters may be saying about the budget, or maybe what others were saying about the teaparty and the potential government shutdown. (since these were interesting topics).Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
After extracting the text from a sample of 1500 tweets I found the following ‘most common’ terms. (in the raw from R)
[1] “cnn” “boehner” “gop” “time” “words” “budget” “historic” “cuts” “billion” “cut”
[11] “cspan” “deal” “heh” “movement” “table” “parenthood” “planned” “tcot” “hee” “andersoncooper”
[21] “funded” “neener” “protecting” “painful” “alot” “party” “tea” “shutdown” “government” “beep”
[31] “jobs” “barackobama” “feels” “drinkers” “koolaid” “deficit” “facing” “trillion” “gown” “wedding”
[41] “healthcare” “spending” “2012” “agree” “plan” “compromise” “victory” “term” “tax” “decorating”
[51] “diy” “home” “bank” “dems” “biggest” “history” “hug” “civil” “hoo” “little”
[61] “38b” “tips” “life” “people” “suicide” “doesnt” “wars” “trump” “system” “books”
[71] “teaparty” “ventura” “etatsunis” “fair” “fight” “military” “actually” “win” “compulsive” “liars”
[81] “tbags” “revenue” “rights” “libya” “base” “elites” “house” “crisis” “housing” “hud”
[91] “dem” “nay” “yea”
I then clustered the terms, using hierarchial clustering to get the following groupings: (click to enlarge)
Although it doesn’t yield extremely revealing or novel results, the clusters do make sense, putting key politicians in the same group as the terms ‘government’ and ‘shutdown’ and putting republicans in the same group as ‘teaparty’ related terms. This at least validates for me, the power of R’s ‘tm’ textmining package. I’m in the ball park, and a better structured analysis could give better results. But this is just for fun.
I also ran some correlations, which gets me about as close to sentiment as I’m going to get for a first time stab at text mining:
For ‘budget’ some of the more correlated terms included deal, shutdown, balanced, reached, danger.
For ‘teaparty’ some of the more correlated terms included boehner, farce, selling, trouble, teapartynation.
For ‘barakobama’ (as it appeared in the text data) some of the more correlated terms included appreciate, cooperation, rejects, hypocrite, achieved, lunacy.
For ‘boehner’ some of the more correlated terms included selling, notwinning, teaparty, moderate, retarded.
To leave a comment for the author, please follow the link and comment on their blog: Econometric Sense.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.