A new version of
rentrez, our package for the NCBI's EUtils API, is making
it's way around the CRAN mirrors. This release represents a substantial
rentrez, including a new vignette
that documents the whole package.
This posts describes some of the new things in
rentrez, and gives us a chance
to thank some of the people that have contributed to this package's development.
Thanks to everyone who has filed and issue or written us an email about
your contributions have been an important part of the package's development. In particular, we welcome
Han Guangchun as a new contributor to
rentrez and thank
Matthew O'Meara for posting
an issue that brought the
by_id mode for
entrez_link (discussed below) to our
The New Stuff
Extract elements from the results of
The NCBI's “summary records” are very useful — they provide the most important
information about a given record in a relatively small and simple file.
provides the function
entrez_summary() to retrieve these records. When more
than one unique ID is passed to
entrez_summary the function returns a list of
esummary objects. For instance, you could find all the genetic variants associated
with asthma by finding links between the OMIM record for asthma and records in the database dbSNP:
snps <- entrez_link(dbfrom="omim", db="snp", id= 600807) snp_summs <- entrez_summary(db="snp", id=snps$links$omim_snp)
A very common use-case for
entrez_summary() is to extract a subset of the
elements from each record in that list. This release includes the function
extract_from_esummary to make this as straightforward as possible. It works with
a single element to extract:
extract_from_esummary(snp_summs, "chr") ## 11079657 2786098 1031772 1031771 545659 ## "17" "1" "2" "2" "11"
Or with multiple elements
summary_table <- extract_from_esummary(snp_summs, c("chr", "global_maf", "fxn_class")) t(summary_table) ## chr global_maf fxn_class ## 11079657 "17" "A=0.4295/2151" "intron-variant" ## 2786098 "1" "T=0.1569/786" "intron-variant" ## 1031772 "2" "G=0.2131/1067" "downstream-variant-500B" ## 1031771 "2" "T=0.2582/1293" "" ## 545659 "11" "C=0.3419/1712" "utr-variant-3-prime"
entrez_link can find external links
In addition to discovering links between records in NCBI databases, the function
entrez_link now provides support for finding external links ('linkouts' in
NCBI terminology). Perhaps the most interesting example is finding links for the
full text of articles in PubMed.
Let's try and find the full text of the paper describing taxize (using that article's PMID). To
override the functions default behaviour (finding links within NCBI databases)
we set the
cmd argument to
llinks (short for library links):
taxize_links <- entrez_link(dbfrom="pubmed", id= 24555091, cmd="llinks") taxize_links ## elink object with contents: ## $linkouts: links to external websites
The print function for this object tells you were the links live.
taxize_links$linkouts ## $ID_24555091 ## $ID_24555091[] ## Linkout from F1000 Research Ltd ## $Url: http://f1000research.com/a ... ## ## $ID_24555091[] ## Linkout from Europe PubMed Central ## $Url: http://europepmc.org/abstr ... ## ## $ID_24555091[] ## Linkout from PubMed Central ## $Url: http://www.ncbi.nlm.nih.go ... ## ## $ID_24555091[] ## Linkout from PubMed Central Canada ## $Url: http://pubmedcentralcanada ...
Each of those elements has a lot of information, but the URLs for each object
are probably the most important. For this reason,
rentrez provides a function
to get just the URLs:
linkout_urls(taxize_links) ## $ID_24555091 ##  "http://f1000research.com/articles/10.12688/f1000research.2-191.v2/doi" ##  "http://europepmc.org/abstract/MED/24555091" ##  "http://www.ncbi.nlm.nih.gov/pmc/articles/pmid/24555091/" ##  "http://pubmedcentralcanada.ca/pmcc/articles/pmid/24555091"
Web History features are easier to use
The NCBI provides a "Web History" feature to let users store the results of their
searches on the NCBI's severs and refer to those results without having to
pass unique ID's back and forth between computers. These features have always been
rentrez but this release makes them easier to use.
Specifically, when the new optional argument
use_history is set to
functions will return a
web_history object which can be used in the place of unique
IDs in calls to
To demonstrate, let's search for PubMed articles about the ciliate genus
Tet_papers <- entrez_search(db="pubmed", term="Tetrahymena[ORGN]", use_history=TRUE) Tet_papers ## Entrez search result with 6599 hits (object contains 20 IDs and a web_history object) ## Search term (as translated): "tetrahymena"[MeSH Terms] OR "tetrahymena"[All Fie ...
Now that we have a web_history object, we can use that to retrieve XML representations
of the first 10 records:
recs <- entrez_fetch(db="pubmed", web_history=Tet_papers$web_history, retmax=10, rettype="xml")
It's easier to keep track of which records are linked to other records
By default, when
entrez_link gets a vector of more than one unique ID, it
returns sets of linked-IDs that match any of the IDs in the original call.
That means the user loses track of the mapping between the original IDs and those
from the linked database.
As of this release,
rentrez supports the NCBI's
by_id mode, which solves this problem.
Setting the new argument
TRUE returns a list, with each element of
that list containing links for only one ID. To demonstrate, let's find protein
sequences associated with specific genes in the NCBI
all_links <- entrez_link(db="protein", dbfrom="gene", id=c(93100, 223646), by_id=TRUE) all_links ## List of 2 elink objects,each containing ## $links: IDs for linked records from NCBI ##
As you can see, printing the returned object let's you know what each element
contains, and you can extract the specific links you are looking for easily:
lapply(all_links, function(x) x$links$gene_protein) ## [] ##  "768043930" "767953815" "558472750" "194394158" "166221824" ##  "154936864" "119602646" "119602645" "119602644" "119602643" ##  "119602642" "37787309" "37787307" "37787305" "33991172" ##  "21619615" "10834676" ## ## [] ##  "148697547" "148697546" "81899807" "74215266" "74186774" ##  "37787317" "37589273" "31982089" "26339824" "26329351"
There are also numerous small changes that improve
rentrez, fix bugs and
extend the package's documentation. We hope you find this new release helpful,
and as always we welcome bug reports via the package's github repository.