biomaRt

[This article was first published on compBiomeBlog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I use R and Bioconductor for most of my work. I am also increasingly replacing things I would have done before in Perl with R. One such example of this is the Bioconductor module biomaRt.

As the name suggest it allows for access to BioMart via R. BioMart is a method of accessing large online databases such as Ensembl. For example you may want to convert gene IDs from Entrez to Symbols, or retrieve 5kb upstream from the transcription start site of a list of genes etc etc. There are lots of things you can do with it.

biomaRt lets you do all this via R. This is particular appealing to me as I do differential gene expression analysis in R, so I have lists of genes already in R objects which I can retrieve lots of information about. Maybe I want all the GO annotations for a gene list, or to find a list of any SNPs within the coding region or something.

Anyway it is pretty useful, the documentation isn’t bad either.

http://www.bioconductor.org/packages/release/bioc/html/biomaRt.html

To give a brief example of how it works:

library(biomaRt)
ids <- c("7157","3845") ensembl = useMart("ensembl", dataset = "hsapiens_gene_ensembl") seqs <- getSequence(id = ids, type = "entrezgene", seqType = "transcript_flank", upstream = 5000, mart = ensembl) seqs <- getSequence(id = ids, type = "entrezgene", seqType = "transcript_flank", upstream = 5000, mart = ensembl) exportFASTA(sequences=seqs,file="example.fas") library(xtable) results <- getGene(id=ids,type="entrezgene",mart=ensembl) print(xtable(results),type="html",file="Example.html")
This code will retrieve 5kb upstream of the transcription start sites of the two genes listed in the ‘ids’ list (though this could be a much longer list). It will then generate an html output file with information about these genes. Simple and effective.

The functions
  • listAttributes(ensembl)
  • listFilters(ensembl)
can be used to show the names of the things you can query and the things you can filter on.

You can also access lots of other databases, not just Ensemble as shown here.

Enjoy.

To leave a comment for the author, please follow the link and comment on their blog: compBiomeBlog.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)