One-liners which make me love R: twitteR’s searchTwitter() #rstats

July 21, 2011

(This article was first published on Things I tend to forget » R, and kindly contributed to R-bloggers)

R reminds me a lot of English. It’s easy to get started, but very difficult to master. So for all those times I’ve spent… well, forever… trying to figure out the “R way” of doing something, I’m glad to share these quick wins.

My recent R tutorial on mining Twitter for consumer sentiment wouldn’t have been possible without Jeff Gentry’s amazing twitteR package (available on CRAN). It does so much of the behind-the-scenes heavy lifting to access Twitter’s REST APIs, that one line of code is all you need to perform a search and retrieve the (even paginated) results:


tweets = searchTwitter("#rstats", n=1500)

You can search for anything, of course, “#rstats” is just an example. (And if you’re really into that hashtag, the twitteR package even provides an Rtweets() function which hardcodes that search string for you.) The n=1500 specifies the maximum number of tweets supported by the Search API, though you may retrieve fewer as Twitter’s search indices contain only a couple of days’ tweets.

What you get back is a list of tweets (technically “status updates”):

> head(tweets)
[1] "Cloudnumberscom: \023 #Rstats gets real in the cloud via @AddToAny"

[1] "0_h_r_1: \023 #Rstats gets real in the cloud via DecisionStats - I came across . ..."

[1] "cmprsk: RT I just joined the beta to run #Rstats in the cloud with via @cloudnumberscom"

[1] "0_h_r_1: I just joined the beta to run #Rstats in the cloud with via @cloudnumberscom"

[1] "cmprsk: RT man, the #rstats think people I am too soft on #sas, the #sas people think I am too soft on #wps, the #wps pe..."

[1] "keepstherainoff: Thanks to @cmprsk @geoffjentry and @MikeKSmith for colour-coded #Rstats GUI advice"

> class(tweets[[1]])
[1] "status"
[1] "twitteR"

Now that you have some tweets, the fun really begins. To get you started, the status class includes a very handy toDataFrame() accessor method (see ?status):

> library(plyr) 
> tweets.df = ldply(tweets, function(t) t$toDataFrame() )

> str(tweets.df)
'data.frame':	131 obs. of  10 variables:
 $ text        : Factor w/ 122 levels " \023 #Rstats gets real in the cloud via @AddToAny",..: 1 2 3 4 5 6 7 8 9 10 ...
 $ favorited   : logi  NA NA NA NA NA NA ...
 $ replyToSN   : logi  NA NA NA NA NA NA ...
 $ created     : POSIXct, format: "2011-07-04 13:50:39" "2011-07-04 13:48:10" "2011-07-04 13:29:00" "2011-07-04 13:23:42" ...
 $ truncated   : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
 $ replyToSID  : logi  NA NA NA NA NA NA ...
 $ id          : Factor w/ 131 levels "87941406873751552",..: 1 2 3 4 5 6 7 8 9 10 ...
 $ replyToUID  : logi  NA NA NA NA NA NA ...
 $ statusSource: Factor w/ 17 levels "<a href="" rel="nofollow">Tweet Button</a>",..: 1 2 3 1 3 4 5 5 3 4 ...
 $ screenName  : Factor w/ 64 levels "Cloudnumberscom",..: 1 2 3 2 3 4 2 5 3 6 ...

You can pull a particular user’s tweets just as easily with the userTimeline() function. Heck, the package even lets you tweet from R if you use Jeff’s companion ROAuth package, but that requires more than one line….


To leave a comment for the author, please follow the link and comment on their blog: Things I tend to forget » R. offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.


Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)