Using R to find Obama’s most frequent twitter hashtags

November 4, 2013

(This article was first published on Decisions and R, and kindly contributed to R-bloggers)

I've been exploring Jeff Gentry's twitteR package, which has a ton of great functionality for intereacting with twitter data in R. Today, I thought a bit about a problem I've noticed several times on twitter: users profiles are often only noisy signals of the content they tweet about!

I decided that a table of a user's commonly-used tweets might give a better sense of the content a user tweets about. My code to extract the hashtags is below (note: you'll need to load the twitteR package, and complete the OAuth Authentication first.. if you're having trouble with this, try visiting this page)

Here's the code I used:

tw = userTimeline("BarackObama", cainfo = x1, n = 3200)
tw = twListToDF(tw)
vec1 = tw$text
extract.hashes = function(vec){
hash.pattern = "#[[:alpha:]]+"
have.hash = grep(x = vec, pattern = hash.pattern)
hash.matches = gregexpr(pattern = hash.pattern,
text = vec[have.hash])
extracted.hash = regmatches(x = vec[have.hash], m = hash.matches)
df = data.frame(table(tolower(unlist(extracted.hash))))
colnames(df) = c("tag","freq")
df = df[order(df$freq,decreasing = TRUE),]
dat = head(extract.hashes(vec1),50)
dat2 = transform(dat,tag = reorder(tag,freq))
p = ggplot(dat2, aes(x = tag, y = freq)) + geom_bar(fill = "blue")
p + coord_flip() + labs(title = "Hashtag frequencies in the tweets of the Obama team (@BarackObama)")

To leave a comment for the author, please follow the link and comment on their blog: Decisions and R. offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)