Working with Venn Diagrams

[This article was first published on Let's talk about science with R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

In this post, we will learn how to create venn diagrams for gene lists and how to retrieve the genes present in each venn compartment with R.

In this particular example, we will generate random gene lists using the molbiotools gene set generator but you can use your own gene lists if you prefer. Specifically, we will generate a random list of 257 genes to represent those that are upregulated in condition and another list of 1570 genes to represent those that are upregulated in condition B.

Screen Shot 2016-06-21 at 2.01.15 PM

Then, we will sort and paste the gene lists in an excel document we will save as randomGeneLists.xlsx.

Now, let’s load the data into R using the gdata package.

geneLists <- read.xls("randomGeneLists.xlsx", sheet=1, stringsAsFactors=FALSE, header=FALSE)

# Notice there are empty strings to complete the data frame in column 1 (V1)

# To convert this data frame to separate gene lists with the empty strings removed we can use lapply() with our home made  function(x) x[x != ""]
geneLS <- lapply(as.list(geneLists), function(x) x[x != ""])

# If this is a bit confusing you can also write a function and then use it in lapply() 
removeEMPTYstrings <- function(x) {

 newVectorWOstrings <- x[x != ""]

geneLS2 <- lapply(as.list(geneLists), removeEMPTYstrings)

# You can print the last 6 entries of each vector stored in your list, as follows:
lapply(geneLS, tail)
lapply(geneLS2, tail) # Both methods return the same results

# We can rename our list vectors
names(geneLS) <- c("ConditionA", "ConditionB")

# Now we can plot a Venn diagram with the VennDiagram R package, as follows:

venn.plot <- venn.diagram(VENN.LIST , NULL, fill=c("darkmagenta", "darkblue"), alpha=c(0.5,0.5), cex = 2, cat.fontface=4, category.names=c("A", "B"), main="Random Gene Lists")

# To plot the venn diagram we will use the grid.draw() function to plot the venn diagram

# To get the list of gene present in each Venn compartment we can use the gplots package

a <- venn(VENN.LIST, show.plot=FALSE)

# You can inspect the contents of this object with the str() function

# By inspecting the structure of the a object created, 
# you notice two attributes: 1) dimnames 2) intersections
# We can store the intersections in a new object named inters
inters <- attr(a,"intersections")

# We can summarize the contents of each venn compartment, as follows:
# in 1) ConditionA only, 2) ConditionB only, 3) ConditionA & ConditionB
lapply(inters, head) 


Now you are ready, to review the genes in each section of the venn diagram separately. Alternatively, you can always use Venny web tool that is a great way to start looking at your data and then write a modified version of this R script to make a more exhaustive figure or facilitate downstream analysis in your script.

Feel free to leave comments or email me at [email protected].

To leave a comment for the author, please follow the link and comment on their blog: Let's talk about science with R. offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)