SNA: Visualising an email box with R

August 10, 2011
By

[This article was first published on Expansed » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Are statistics sexy? Visualising social networks certainly is! I wrote a little function, which makes producing beautiful plots depicting a mailbox with R an extremely easy task. I find visualisations of ‘social graphs’ particularly appealing. They look like flowers.

I had to use a few Python functions which can be executed within R with rJython library. The function connects to IMAP server and looks for “To:” and “From:” sections in stored emails. It should not be difficult to adapt this script to work with POP3 too. I am really impressed by what R can do (with a little bit of help from Python). Can anyone suggest a more elegant way to do the same thing without executing Python?

As rJython depends on rJava I had to install Java Development kit to launch it.

Warning: For me this function worked very well and did not do any harm to my mailbox. Despite that I am not an expert in IMAP so if you are going  to run it you are doing it at your own risk.

Here is the function:

mailSoc <- function(login,
                    pass,
                    serv = "imap.gmail.com", #specify IMAP server
                    ntore = 50, #ignore if addressed to more than
                    todow = -1, #how many to download
                    begin = -1){  #from which to start
 
  #load rJython and Python libraries
  require(rJython)  
  rJython <- rJython(modules = "imaplib")
  rJython$exec("import imaplib")
 
  #connect to server
  rJython$exec(paste("mymail = imaplib.IMAP4_SSL('",
                     serv, "')", sep = ""))
  rJython$exec(paste("mymail.login(\'",
                     login, "\',\'",
                     pass, "\')", sep = ""))
 
  #get number of available messages
  rJython$exec("sel = mymail.select()")
  rJython$exec("number = sel[1]")
  nofmsg <- .jstrVal(rJython$get("number"))
  nofmsg <- as.numeric(unlist(strsplit(nofmsg, "'"))[2])
 
  #if 'begin' not specified begin from the newest
  if(begin == -1)
  {
    begin <- nofmsg
  }
 
  #if 'todow' not specified download all
  if(todow == -1)
  {
    end <- 1
  }
  else
  {
    end <- begin - todow
  }
 
  #give a little bit of information
  todownload <- begin - end
  print(paste("Found", nofmsg, "emails"))
  print(paste("I will download", todownload, "messages."))
  print("It can take a while")
 
  data <- data.frame()
 
  #fetching emails
  for (i in begin:end) {
    nr <- as.character(i)
 
    #get sender
   rJython$exec(paste("typ, fro = mymail.fetch(\'", nr, "\', \'(BODY[HEADER.FIELDS (from)])\')", sep = ""))
    rJython$exec("fro = fro[0][1]")
    from <- .jstrVal(rJython$get("fro"))
    from <- unlist(strsplit(from, "[<>\r\n, \"]"))
    from <- sub("from: ", "", from, ignore.case = TRUE)
    from <- grep("@", from, value = TRUE)
 
    #get addresees
    rJython$exec(paste("typ, to = mymail.fetch(\'", nr, "\', \'(BODY[HEADER.FIELDS (to)])\')", sep = ""))
    rJython$exec("to = to[0][1]")
    to <- .jstrVal(rJython$get("to"))
    to <- unlist(strsplit(to, "[<>\r\n, \"]"))
    to <- sub("to: ", "", to, ignore.case = TRUE)
    from <- sub("\"", "", from, ignore.case = TRUE)
    to  <- grep("@", to, value = TRUE)
 
    #if reasonable number of addressses add to data frame
    if(length(to) <= ntore){
    vec <- rep(from, length(to))
    data <- rbind(data, data.frame(vec, to))
    }
 
    #give some information about progress
    if((i - begin) %% 100 == 0)
    {
      print(paste((i - begin)*(-1), "/", todownload,
                  " Downloading...", sep = ""))
    }
  }
  names(data) <- c("from", "to")
  data$from <- tolower(data$from)
  data$to <- tolower(data$to)
 
  #close connection
  rJython$exec("mymail.shutdown()")
  return(data)
}

Now we can run eg.

#download 200 most recent emails from gmail account
maild <- mailSoc("login", "password", serv = "imap.gmail.com",
                ntore = 40, todow = 200)

And to make a plot it is necessary to load network library

library(network)
mailnet <- network(maild)
plot(maild)

This is the result:

Social network analysis: visualisation of mailbox with R

R provides many other social network analysis tools such as igraph library. For instance, it can be used to make an interactive ‘plot’:

library(igraph)
h <- graph.data.frame(maild, directed = FALSE)
tkplot(h, vertex.label = V(h)$name,
       layout=layout.fruchterman.reingold)

I would like to learn more about SNA as well as I would like to try out Gephi which can produce visualisations which are even more attractive than those made in R so I think that I will write about my first impression soon.

Post to Twitter

To leave a comment for the author, please follow the link and comment on their blog: Expansed » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Tags: , , ,

Comments are closed.

Search R-bloggers

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)