Are statistics sexy? Visualising social networks certainly is! I wrote a little function, which makes producing beautiful plots depicting a mailbox with R an extremely easy task. I find visualisations of ‘social graphs’ particularly appealing. They look like flowers.
I had to use a few Python functions which can be executed within R with rJython library. The function connects to IMAP server and looks for “To:” and “From:” sections in stored emails. It should not be difficult to adapt this script to work with POP3 too. I am really impressed by what R can do (with a little bit of help from Python). Can anyone suggest a more elegant way to do the same thing without executing Python?
As rJython depends on rJava I had to install Java Development kit to launch it.
Warning: For me this function worked very well and did not do any harm to my mailbox. Despite that I am not an expert in IMAP so if you are going to run it you are doing it at your own risk.
Here is the function:
mailSoc <- function(login, pass, serv = "imap.gmail.com", #specify IMAP server ntore = 50, #ignore if addressed to more than todow = -1, #how many to download begin = -1){ #from which to start #load rJython and Python libraries require(rJython) rJython <- rJython(modules = "imaplib") rJython$exec("import imaplib") #connect to server rJython$exec(paste("mymail = imaplib.IMAP4_SSL('", serv, "')", sep = "")) rJython$exec(paste("mymail.login(\'", login, "\',\'", pass, "\')", sep = "")) #get number of available messages rJython$exec("sel = mymail.select()") rJython$exec("number = sel[1]") nofmsg <- .jstrVal(rJython$get("number")) nofmsg <- as.numeric(unlist(strsplit(nofmsg, "'"))[2]) #if 'begin' not specified begin from the newest if(begin == -1) { begin <- nofmsg } #if 'todow' not specified download all if(todow == -1) { end <- 1 } else { end <- begin - todow } #give a little bit of information todownload <- begin - end print(paste("Found", nofmsg, "emails")) print(paste("I will download", todownload, "messages.")) print("It can take a while") data <- data.frame() #fetching emails for (i in begin:end) { nr <- as.character(i) #get sender rJython$exec(paste("typ, fro = mymail.fetch(\'", nr, "\', \'(BODY[HEADER.FIELDS (from)])\')", sep = "")) rJython$exec("fro = fro[0][1]") from <- .jstrVal(rJython$get("fro")) from <- unlist(strsplit(from, "[<>\r\n, \"]")) from <- sub("from: ", "", from, ignore.case = TRUE) from <- grep("@", from, value = TRUE) #get addresees rJython$exec(paste("typ, to = mymail.fetch(\'", nr, "\', \'(BODY[HEADER.FIELDS (to)])\')", sep = "")) rJython$exec("to = to[0][1]") to <- .jstrVal(rJython$get("to")) to <- unlist(strsplit(to, "[<>\r\n, \"]")) to <- sub("to: ", "", to, ignore.case = TRUE) from <- sub("\"", "", from, ignore.case = TRUE) to <- grep("@", to, value = TRUE) #if reasonable number of addressses add to data frame if(length(to) <= ntore){ vec <- rep(from, length(to)) data <- rbind(data, data.frame(vec, to)) } #give some information about progress if((i - begin) %% 100 == 0) { print(paste((i - begin)*(-1), "/", todownload, " Downloading...", sep = "")) } } names(data) <- c("from", "to") data$from <- tolower(data$from) data$to <- tolower(data$to) #close connection rJython$exec("mymail.shutdown()") return(data) }
Now we can run eg.
#download 200 most recent emails from gmail account maild <- mailSoc("login", "password", serv = "imap.gmail.com", ntore = 40, todow = 200)
And to make a plot it is necessary to load network library
library(network) mailnet <- network(maild) plot(maild)
This is the result:
R provides many other social network analysis tools such as igraph library. For instance, it can be used to make an interactive ‘plot’:
library(igraph) h <- graph.data.frame(maild, directed = FALSE) tkplot(h, vertex.label = V(h)$name, layout=layout.fruchterman.reingold)
I would like to learn more about SNA as well as I would like to try out Gephi which can produce visualisations which are even more attractive than those made in R so I think that I will write about my first impression soon.
R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series,ecdf, trading) and more...


Zero Inflated Models and Generalized Linear Mixed Models with R.
Zuur, Saveliev, Ieno (2012).