Working with Venn Diagrams

June 21, 2016

(This article was first published on Let's talk about science with R, and kindly contributed to R-bloggers)

In this post, we will learn how to create venn diagrams for gene lists and how to retrieve the genes present in each venn compartment with R.

In this particular example, we will generate random gene lists using the molbiotools gene set generator but you can use your own gene lists if you prefer. Specifically, we will generate a random list of 257 genes to represent those that are upregulated in condition and another list of 1570 genes to represent those that are upregulated in condition B.

Screen Shot 2016-06-21 at 2.01.15 PM

Then, we will sort and paste the gene lists in an excel document we will save as randomGeneLists.xlsx.

Now, let’s load the data into R using the gdata package.

geneLists <- read.xls("randomGeneLists.xlsx", sheet=1, stringsAsFactors=FALSE, header=FALSE)

# Notice there are empty strings to complete the data frame in column 1 (V1)

# To convert this data frame to separate gene lists with the empty strings removed we can use lapply() with our home made  function(x) x[x != ""]
geneLS <- lapply(as.list(geneLists), function(x) x[x != ""])

# If this is a bit confusing you can also write a function and then use it in lapply() 
removeEMPTYstrings <- function(x) {

 newVectorWOstrings <- x[x != ""]

geneLS2 <- lapply(as.list(geneLists), removeEMPTYstrings)

# You can print the last 6 entries of each vector stored in your list, as follows:
lapply(geneLS, tail)
lapply(geneLS2, tail) # Both methods return the same results

# We can rename our list vectors
names(geneLS) <- c("ConditionA", "ConditionB")

# Now we can plot a Venn diagram with the VennDiagram R package, as follows:

venn.plot <- venn.diagram(VENN.LIST , NULL, fill=c("darkmagenta", "darkblue"), alpha=c(0.5,0.5), cex = 2, cat.fontface=4, category.names=c("A", "B"), main="Random Gene Lists")

# To plot the venn diagram we will use the grid.draw() function to plot the venn diagram

# To get the list of gene present in each Venn compartment we can use the gplots package

a <- venn(VENN.LIST, show.plot=FALSE)

# You can inspect the contents of this object with the str() function

# By inspecting the structure of the a object created, 
# you notice two attributes: 1) dimnames 2) intersections
# We can store the intersections in a new object named inters
inters <- attr(a,"intersections")

# We can summarize the contents of each venn compartment, as follows:
# in 1) ConditionA only, 2) ConditionB only, 3) ConditionA & ConditionB
lapply(inters, head) 


Now you are ready, to review the genes in each section of the venn diagram separately. Alternatively, you can always use Venny web tool that is a great way to start looking at your data and then write a modified version of this R script to make a more exhaustive figure or facilitate downstream analysis in your script.

Feel free to leave comments or email me at [email protected]

To leave a comment for the author, please follow the link and comment on their blog: Let's talk about science with R. offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.


Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)