The Budget Compromise: Mining Tweets

April 9, 2011

(This article was first published on Econometric Sense, and kindly contributed to R-bloggers)

After a week as SAS Gobal Forum, I’ve been pretty excited about some of the text mining presentations that I got to see. I couldn’t wait to get back to work to at least try something.  After getting home I found a tweet from @imusicmash sharing a post from the Heuristic Andrew blog that shared text mining code from R.  I thought I’d use that code to mine tweets related to the budget compromise/government shutdown.  I searched two hashtags, #budget and #teaparty. I originally wanted to see if I could find out either what teaparty supporters may be saying about the budget, or maybe what others were saying about the teaparty and the potential government shutdown. (since these were interesting topics).

After extracting the text from a sample of 1500 tweets I found the following ‘most common’ terms. (in the raw from R)

[1] “cnn”            “boehner”        “gop”            “time”           “words”          “budget”         “historic”       “cuts”           “billion”        “cut”          
[11] “cspan”          “deal”           “heh”            “movement”       “table”          “parenthood”     “planned”        “tcot”           “hee”            “andersoncooper”
[21] “funded”         “neener”         “protecting”     “painful”        “alot”           “party”          “tea”            “shutdown”       “government”     “beep”         
[31] “jobs”           “barackobama”    “feels”          “drinkers”       “koolaid”        “deficit”        “facing”         “trillion”       “gown”           “wedding”      
[41] “healthcare”     “spending”       “2012”           “agree”          “plan”           “compromise”     “victory”        “term”           “tax”            “decorating”   
[51] “diy”            “home”           “bank”           “dems”           “biggest”        “history”        “hug”            “civil”          “hoo”            “little”       
[61] “38b”            “tips”           “life”           “people”         “suicide”        “doesnt”         “wars”           “trump”          “system”         “books”        
[71] “teaparty”       “ventura”        “etatsunis”      “fair”           “fight”          “military”       “actually”       “win”            “compulsive”     “liars”        
[81] “tbags”          “revenue”        “rights”         “libya”          “base”           “elites”         “house”          “crisis”         “housing”        “hud”          
[91] “dem”            “nay”            “yea”    
I then clustered the terms, using hierarchial clustering to get the following groupings: (click to enlarge)
Although it doesn’t yield extremely revealing or novel results, the clusters do make sense, putting key politicians in the same group as the terms ‘government’ and ‘shutdown’ and putting republicans in the same group as ‘teaparty’ related terms. This at least validates for me, the power of R’s ‘tm’ textmining package. I’m in the ball park, and a better structured analysis could give better results. But this is just for fun. 
I also ran some correlations, which gets me about as close to sentiment as I’m going to get for a first time stab at text mining:
For ‘budget’ some of the more correlated terms included deal, shutdown, balanced, reached, danger.
For ‘teaparty’ some of the more correlated terms included boehner, farce, selling,  trouble, teapartynation.
For ‘barakobama’ (as it appeared in the text data) some of the more correlated terms included appreciate, cooperation, rejects, hypocrite, achieved, lunacy.
For ‘boehner’ some of the more correlated terms included selling, notwinning, teaparty, moderate, retarded.

To leave a comment for the author, please follow the link and comment on their blog: Econometric Sense. offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Tags: , ,

Comments are closed.

Search R-bloggers


Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)