GEO2R: Web App to Analyze Gene Expression in GEO Datasets Using R

[This article was first published on Getting Genetics Done, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Gene Expression Omnibus is NCBI’s repository for publicly available gene expression data with thousands of datasets having over 600,000 samples with array or sequencing data. You can download data from GEO using FTP, or download and load the data directly into R using the GEOquery bioconductor package written (and well documented) by Sean Davis, and analyze the data using the limma package.

GEO2R is a very nice web-based tool to do this graphically and automatically. Enter the GEO series number in the search box (or use this one for an example). Start by creating groups (e.g. control vs treatment, early vs late time points in a time course, etc), then select samples to add to that group.



Scroll down to the bottom and click Top 250 to run an analysis in limma (the users guide documents this well). GEO2R will automatically fetch the data, group your samples, create your design matrix for your differential expression analysis, run the analysis, and annotate the results. A big complaint with point-and-click GUI and web based applications is the lack of reproducibility. GEO2R obviates this problem by giving you all the R code it generated to run the analysis. Click the R script tab to see the R code it generated, and save it for later.



The options tab allows you to adjust the multiple testing correction method, and the value distribution tab lets you take a look at the distribution gene expression values among the samples that you assigned to your groups.



There’s no built-in quality assessment tools in GEO2R, but you can always take the R code it generated and do your own QA/QC. It’s also important to verify what values it’s pulling from each array into the data matrix. In this example, epithelial cells at various time points were compared to a reference cell line, and the log base 2 fold change was calculated. This was used in the data matrix rather than the actual expression values.

GEO2R is a very nice tool to quickly run an analysis on data in GEO. Now, if we could only see something similar for the European repository, ArrayExpress.

GEO2R: Web App to Analyze Gene Expression in GEO Datasets Using R

To leave a comment for the author, please follow the link and comment on their blog: Getting Genetics Done.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)