Minute by Minute Twitter Sentiment Timeline from the VP debate

October 12, 2012
By

(This article was first published on NERD PROJECT » R project posts, and kindly contributed to R-bloggers)

Click on above graph to enlarge.

Background

The data for this graph was collected automatically every ~60 seconds of the VP debate on 10/11/2012, with an ending aggregate sample size of 363,163 tweets.  From this dataset duplicate tweets were removed (because of bots), which gave a final dataset of 81,124 remaining unique tweets (52,303-Biden, 28,821-Ryan).  Every point in this graph is the mean sentiment of tweets gathered for that minute.  The farther above zero the point is means that it is higher positive sentiment of the tweets, and the lower it gets below zero the more negative. It would be very interesting to compare this to the transcript for inference.  The one very noticeable take away is the jump in sentiment as soon as the debate was over at 22:30

R Code for this data collection and graphing

To collect this data I updated my original code from the presidential debate as follows:

vp<-function(x){
Ryan=searchTwitter('@PaulRyanVP', n=1500)
Biden=searchTwitter('@JoeBiden', n=1500)
textRyan=laply(Ryan, function(t) t$getText())
textBiden=laply(Biden, function(t) t$getText())
resultRyan=score.sentiment(textRyan, positive.words, negative.words)
resultRyan$candidate='Ryan'
resultBiden=score.sentiment(textBiden, positive.words, negative.words)
resultBiden$candidate='Biden'
result<-merge(resultBiden,resultRyan, all=TRUE)
result$candidate<-as.factor(result$candidate)
result$time<-date()
return(result)
}

Then to have it R run automatically collect the data every 60 seconds in an endless loop (I wasn’t sure when I wanted to stop it at the time) you just run a repeat function.

debate<-vp()
repeat {
startTime x<-vp()
debate<-merge(x, debate, all=TRUE)
sleepTime 0)
Sys.sleep(sleepTime)
}

At 10:56pm I got bored and the debate was over, so I just hit stop and ran the following to get the graph:
x<-subset(debate, !duplicated(text))
x$minute<-strptime(x$time, "%a %b %d %H:%M:%S %Y")
x$minute1<-format(x$minute,"%H:%M")
x<-subset(x, minute1>="21:00")
period<-unique(x$minute1)
period<-period[order(period)]
Biden Ryan mean<-data.frame(period, Biden, Ryan)
dfm ggplot(dfm, aes(period, value, colour=variable, group=variable, xlab="time", ylab="score"))+
geom_point()+geom_line()+opts(axis.text.x=theme_text(angle=45),
axis.ticks = theme_blank(),axis.title.y=theme_blank())

I have to admit, doing this actually made watching the debate kind of fun.

For cleaner access to the code please go to my git hub


To leave a comment for the author, please follow the link and comment on his blog: NERD PROJECT » R project posts.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.