What does it say about r?

(This article was first published on jkunst.com: Entries for category R, and kindly contributed to R-bloggers)

The last post I show a way to plot a gexf file in R using the rgexf package and the Sigmajs library. Now we need some data to use that piece of code. So I’ve decided obtain the tweets about R. For this I’ve used the twitteR package and search “#rstats”, then clean the texts and extract all the hashtags. Then find the associations between following the next simple rule: if a tweet said: “#rstats and #data are my drugs” this two hashtags are related. Then put some graphics attributes like size of the node according the quantity of mentions, and some random to make the graph more attractive.

Finally make te code run and see the result 😉.

There are many tweets about #python in the #rstats ‘s tweets!. It is obvious see many tweets about data (#data, #bi, #datamining, #bigdata, etc). In other hand there are conversations about #sas, #matlab, and #sastip, and so on.


# Some tweets about R
tweets <- tolower(twListToDF(searchTwitter(searchString="#rstats", n=1500))$text)
hashtags_remove <- c("#rstats", "#r")

# Cleaning the tweets
for(term in hashtags_remove) tweets <- gsub(term, "", tweets)

# Extract the hastags
hashtags <- unique(unlist(str_extract_all(tolower(tweets), "#\\w+")))
hashtags <- setdiff(hashtags, hashtags_remove)

# Capture the node size according the amount that appear
nodesizes <- laply(hashtags, function(hashtag){
  sum(grepl(hashtag, tweets))

# scaling  sizes
nodesizes <-  1 + log(nodesizes, base = 3)

nodes <- data.frame(id = c(1:length(hashtags)), label = hashtags, stringsAsFactors=F)

relations <- ldply(hashtags, function(hashtag){
  hashtag_related <- unlist(str_extract_all(tweets[grepl(hashtag, tweets)], "#\\w+"))
  hashtag_related <- setdiff(hashtag_related, hashtag) 
  data.frame(source = which(hashtags==hashtag),
             target =  which(hashtags %in% hashtag_related))

# Is an undirected graph! So remove the duplicates
for(row in 1:nrow(relations)){  
  relations[row,] <- sort(relations[row,])

relations <- unique(relations)

# Some colors
nodecolors <- data.frame(r = sample(1:249, size = nrow(nodes), replace=T),
                         g = sample(1:249, size = nrow(nodes), replace=T),
                         b = sample(1:249, size = nrow(nodes), replace=T),
                         a = 1)

links <- matrix(rep(0, length(hashtags)^2), ncol = length(hashtags))
for(edge in 1:nrow(relations)){
      links[(relations[edge,]$target), (relations[edge,]$source)] <- 1

positions <- gplot.layout.kamadakawai(links, layout.par=list())
positions <- cbind(positions, 0) # needs a z axis

graph <- write.gexf(nodes=nodes,


To leave a comment for the author, please follow the link and comment on their blog: jkunst.com: Entries for category R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Recent popular posts


Mango solutions

RStudio homepage

Zero Inflated Models and Generalized Linear Mixed Models with R

Dommino data lab

Quantide: statistical consulting and training




CRC R books series

Six Sigma Online Training

Contact us if you wish to help support R-bloggers, and place your banner here.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)