Genome annotation with NCBI2R

[This article was first published on Milano R net, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

It’s very convenient manage data with R: you can import your dataset, you could find many packages which respond to your needs, then you could plot your results.
However it could be very bothersome retrieve the data from online databases. You need to use the specific API and maybe write your scripts using a new programming language, then you have to convert your data in a table format and finally import them with R.
There are many R packages available to explore the biological online databases, in particular I work among SNPs and genomic data: some of those packages are in Bioconductor, however there are other packages such as RNCBI and rentrez which are very suitable.
In this post I’d like to show you a new package which permit you to get information from the NCBI database: ‘NCBI2R‘.
The ‘NCBI2R’ package as many functions to retrieve data. If you want information about a list of SNPs you just need to type:
GetSNPInfo( list_of_snp ) and you’ll get chr, position, locusID and genesymbol.
There is an analogous function to get information about genes: GetGeneInfo()
You could find a tons of functions which will help you to reach your goal:
– GetGenesInRegion() (try to guess what it does..)
– GetGenesInSNPs() which returns a vector of genes that the provided SNPs are located within
– GetSNPsInGenes() which returns the SNPs within the boundary of genes

The package provides functions for:

  • Gene Ontologies – GetGOs()
  • interactions – GetInteractions()
  • Linkage Disequilibrium information from HapMap – GetLDInfo()
  • neighbouring genes – GetNeighGenes()
  • pathways in which the genes are involved – GetPathways()
  • flanking sequence – GetSNPFlankSeq()

and so on..

If you work on GWAS, you may find useful the function called GetPublishedGWAS(), I play with it a little bit, it’s fine!

Two useful functions are VisualiseRegion() to display a genomic region and MakeHTML() whereby you can put the genetic annotation dataframes into a HTML document.

There are also ‘funny’ functions such as AminoAcids() which return a reference table of aminoacids and NatureJobs() to get a dataframe of jobs from!

Do you need to open a webpage (e.g. a link to gbrowse) at one point in your script?
Type OpenURL(urls, safety=10)

Check the documentation at and have a go!

To leave a comment for the author, please follow the link and comment on their blog: Milano R net. offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)