# When Venn diagrams are not enough – Visualizing overlapping data with Social Network Analysis in R

March 2, 2012
By

(This article was first published on Sustainable Research » Renglish, and kindly contributed to R-bloggers)

I recently thought about ways to visualize medications and their co-occurences in a group of children. As long as you want to visualize up to  4 different medications you can simply use Venn diagrams. There is a very nice R-package to generate these kind of graphics for you (for a  description see: Chen and Boutros, 2011). But this is of little help here.

The problem I faced involved 29 different medications and 50 children. So my data was stored in a table with 29 columns – one for each medication – and 50 rows – one for each child, so that the cells indicate whether or not the child took the medication.

M <- matrix(sample(0:1, 1450, replace=TRUE, prob=c(0.9,0.1)), nc=29)

### The Solution – Social Network Analysis

There are a several R-packages to analyze and visualize social network data – I will focus on “igraph” in this post. The problem I had was that I was not – and probably I am still not –  familiar with the concepts and nomenclature of this field. The key to using the data described above in terms of network analysis was understanding that such data is called an affiliation matrix, where individuals are affiliated with certain events. As “igraph” likes adjacency matrices, where every column and row represents a different node – in our case a medication. The diagonal gives the number of times a medication was given (more information can be found on Daizaburo Shizuka site).

We transform an affilition matrix into an adjacency matrix in R simply by:

Now we can make a first bare-minimum plot:

require(igraph)
summary(g)
plot(g, main=”The bare minimum”)

### Adding information and spicing it up a notch

In all likelihood You want to add at least three kinds of  information:

1. Labels for the nodes
2. Size of the nodes to represent the total number of events, aka medications
3. Size of the links to represent the overlap between medications

name<-sample(c(LETTERS, letters, 1:99), 29, replace=TRUE)
width<-(E(g)$weight/2)+1 plot(g, main=”A little more information”, vertex.size=number,vertex.label=name,edge.width=width) The “igraph” package lets you adopt quite a few parameters so you should consult with the manual. I only changed some of the colors, layout, fonts, etc. plot(g, main=”Spice it up a notch”, vertex.size=number, vertex.label=name, edge.width=width, layout=layout.lgl, vertex.color=”red”, edge.color=”darkgrey”, vertex.label.family =”sans”, vertex.label.color=”black”) Here is just the code: ?View Code RSPLUS  require(igraph) setwd("~/Desktop/") # Generate example data M <- matrix(sample(0:1, 1450, replace=TRUE, prob=c(0.9,0.1)), nc=29) # Transform matrices adj=M%*%t(M) # Make a simple plot g<-graph.adjacency(adj,mode="undirected", weighted=TRUE,diag=FALSE) summary(g) plot(g, main="The bare minimum") # Add more information name<-sample(c(LETTERS, letters, 1:99), 29, replace=TRUE) number<-diag(adj)*5+5 width<-(E(g)$weight/2)+1   plot(g, main="A little more information", vertex.size=number,vertex.label=name,edge.width=width)   # Adjust some plotting parameters plot(g, main="Spice it up a notch", vertex.size=number, vertex.label=name, edge.width=width, layout=layout.lgl, vertex.color="red", edge.color="darkgrey", vertex.label.family ="sans", vertex.label.color="black")

To leave a comment for the author, please follow the link and comment on his blog: Sustainable Research » Renglish.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...