SNA: Visualising an email box with R

August 10, 2011

(This article was first published on Expansed » R, and kindly contributed to R-bloggers)

Are statistics sexy? Visualising social networks certainly is! I wrote a little function, which makes producing beautiful plots depicting a mailbox with R an extremely easy task. I find visualisations of ‘social graphs’ particularly appealing. They look like flowers.

I had to use a few Python functions which can be executed within R with rJython library. The function connects to IMAP server and looks for “To:” and “From:” sections in stored emails. It should not be difficult to adapt this script to work with POP3 too. I am really impressed by what R can do (with a little bit of help from Python). Can anyone suggest a more elegant way to do the same thing without executing Python?

As rJython depends on rJava I had to install Java Development kit to launch it.

Warning: For me this function worked very well and did not do any harm to my mailbox. Despite that I am not an expert in IMAP so if you are going  to run it you are doing it at your own risk.

Here is the function:

mailSoc <- function(login,
                    serv = "", #specify IMAP server
                    ntore = 50, #ignore if addressed to more than
                    todow = -1, #how many to download
                    begin = -1){  #from which to start
  #load rJython and Python libraries
  rJython <- rJython(modules = "imaplib")
  rJython$exec("import imaplib")
  #connect to server
  rJython$exec(paste("mymail = imaplib.IMAP4_SSL('",
                     serv, "')", sep = ""))
                     login, "\',\'",
                     pass, "\')", sep = ""))
  #get number of available messages
  rJython$exec("sel =")
  rJython$exec("number = sel[1]")
  nofmsg <- .jstrVal(rJython$get("number"))
  nofmsg <- as.numeric(unlist(strsplit(nofmsg, "'"))[2])
  #if 'begin' not specified begin from the newest
  if(begin == -1)
    begin <- nofmsg
  #if 'todow' not specified download all
  if(todow == -1)
    end <- 1
    end <- begin - todow
  #give a little bit of information
  todownload <- begin - end
  print(paste("Found", nofmsg, "emails"))
  print(paste("I will download", todownload, "messages."))
  print("It can take a while")
  data <- data.frame()
  #fetching emails
  for (i in begin:end) {
    nr <- as.character(i)
    #get sender
   rJython$exec(paste("typ, fro = mymail.fetch(\'", nr, "\', \'(BODY[HEADER.FIELDS (from)])\')", sep = ""))
    rJython$exec("fro = fro[0][1]")
    from <- .jstrVal(rJython$get("fro"))
    from <- unlist(strsplit(from, "[<>\r\n, \"]"))
    from <- sub("from: ", "", from, = TRUE)
    from <- grep("@", from, value = TRUE)
    #get addresees
    rJython$exec(paste("typ, to = mymail.fetch(\'", nr, "\', \'(BODY[HEADER.FIELDS (to)])\')", sep = ""))
    rJython$exec("to = to[0][1]")
    to <- .jstrVal(rJython$get("to"))
    to <- unlist(strsplit(to, "[<>\r\n, \"]"))
    to <- sub("to: ", "", to, = TRUE)
    from <- sub("\"", "", from, = TRUE)
    to  <- grep("@", to, value = TRUE)
    #if reasonable number of addressses add to data frame
    if(length(to) <= ntore){
    vec <- rep(from, length(to))
    data <- rbind(data, data.frame(vec, to))
    #give some information about progress
    if((i - begin) %% 100 == 0)
      print(paste((i - begin)*(-1), "/", todownload,
                  " Downloading...", sep = ""))
  names(data) <- c("from", "to")
  data$from <- tolower(data$from)
  data$to <- tolower(data$to)
  #close connection

Now we can run eg.

#download 200 most recent emails from gmail account
maild <- mailSoc("login", "password", serv = "",
                ntore = 40, todow = 200)

And to make a plot it is necessary to load network library

mailnet <- network(maild)

This is the result:

Social network analysis: visualisation of mailbox with R

R provides many other social network analysis tools such as igraph library. For instance, it can be used to make an interactive ‘plot’:

h <-, directed = FALSE)
tkplot(h, vertex.label = V(h)$name,

I would like to learn more about SNA as well as I would like to try out Gephi which can produce visualisations which are even more attractive than those made in R so I think that I will write about my first impression soon.

Post to Twitter

To leave a comment for the author, please follow the link and comment on their blog: Expansed » R. offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Tags: , , ,

Comments are closed.


Mango solutions

RStudio homepage

Zero Inflated Models and Generalized Linear Mixed Models with R

Quantide: statistical consulting and training


CRC R books series

Contact us if you wish to help support R-bloggers, and place your banner here.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)