Getting ecology and evolution journal titles from R

[This article was first published on Recology - R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.


So I want to mine some #altmetrics data for some research I’m thinking about doing. The steps would be:

  • Get journal titles for ecology and evolution journals.
  • Get DOI’s for all papers in all the above journal titles.
  • Get altmetrics data on each DOI.
  • Do some fancy analyses.
  • Make som pretty figs.
  • Write up results.

It’s early days, so jus working on the first step. However, getting a list of journals in ecology and evolution is frustratingly hard. This turns out to not be that easy if you are (1) trying to avoid Thomson Reuters, and (2) want a machine interface way to do it (read: API).

Unfortunately, Mendeley’s API does not have methods for getting a list of journals by field, or at least I don’t know how to do it using their API. No worries though – Crossref comes to save the day. Here’s my attempt at this using the Crossref OAI-PMH.


I wrote a little while loop to get journal titles from the Crossref OAI-PMH. This takes a while to run, but at least it works on my machine – hopefully yours too!

library<span class="p">(</span>XML<span class="p">)</span>
library<span class="p">(</span>RCurl<span class="p">)</span>

token <span class="o"><-</span> <span class="s">"characters"</span>  <span class="c1"># define a iterator, also used for gettingn the resumptionToken</span>
nameslist <span class="o"><-</span> list<span class="p">()</span>  <span class="c1"># define empty list to put joural titles in to</span>
<span class="kr">while</span> <span class="p">(</span>is.character<span class="p">(</span>token<span class="p">)</span> <span class="o">==</span> <span class="kc">TRUE</span><span class="p">)</span> <span class="p">{</span>
    baseurl <span class="o"><-</span> <span class="s">"http://oai.crossref.org/OAIHandler?verb=ListSets"</span>
    <span class="kr">if</span> <span class="p">(</span>token <span class="o">==</span> <span class="s">"characters"</span><span class="p">)</span> <span class="p">{</span>
        tok2 <span class="o"><-</span> <span class="kc">NULL</span>
    <span class="p">}</span> <span class="kr">else</span> <span class="p">{</span>
        tok2 <span class="o"><-</span> paste<span class="p">(</span><span class="s">"&resumptionToken="</span><span class="p">,</span> token<span class="p">,</span> sep <span class="o">=</span> <span class="s">""</span><span class="p">)</span>
    <span class="p">}</span>
    query <span class="o"><-</span> paste<span class="p">(</span>baseurl<span class="p">,</span> tok2<span class="p">,</span> sep <span class="o">=</span> <span class="s">""</span><span class="p">)</span>
    crsets <span class="o"><-</span> xmlToList<span class="p">(</span>xmlParse<span class="p">(</span>getURL<span class="p">(</span>query<span class="p">)))</span>
    names <span class="o"><-</span> as.character<span class="p">(</span>sapply<span class="p">(</span>crsets<span class="p">[[</span><span class="m">4</span><span class="p">]],</span> <span class="kr">function</span><span class="p">(</span>x<span class="p">)</span> x<span class="p">[[</span><span class="s">"setName"</span><span class="p">]]))</span>
    nameslist<span class="p">[[</span>token<span class="p">]]</span> <span class="o"><-</span> names
    <span class="kr">if</span> <span class="p">(</span>class<span class="p">(</span>try<span class="p">(</span>crsets<span class="p">[[</span><span class="m">2</span><span class="p">]]$</span><span class="m">.</span>attrs<span class="p">[[</span><span class="s">"resumptionToken"</span><span class="p">]]))</span> <span class="o">==</span> <span class="s">"try-error"</span><span class="p">)</span> <span class="p">{</span>
        stop<span class="p">(</span><span class="s">"no more data"</span><span class="p">)</span>
    <span class="p">}</span> <span class="kr">else</span> token <span class="o"><-</span> crsets<span class="p">[[</span><span class="m">2</span><span class="p">]]$</span><span class="m">.</span>attrs<span class="p">[[</span><span class="s">"resumptionToken"</span><span class="p">]]</span>
<span class="p">}</span>

Yay! Hopefully it worked if you tried it. Let’s see how long the list of journal titles is.

sapply<span class="p">(</span>nameslist<span class="p">,</span> length<span class="p">)</span>  <span class="c1"># length of each list</span>
                          characters c65ebc3f-b540-4672-9c00-f3135bf849e3 
                               10001                                10001 
6f61b343-a8f4-48f1-8297-c6f6909ca7f7 
                                6864
allnames <span class="o"><-</span> do.call<span class="p">(</span>c<span class="p">,</span> nameslist<span class="p">)</span>  <span class="c1"># combine to list</span>
length<span class="p">(</span>allnames<span class="p">)</span>
[1] 26866

Now, let’s use some regex to pull out the journal titles that are likely ecology and evolutionary biology journals. The ^ symbol says “the string must start here”. The \\s means whitespace. The [] lets you specify a set of letters you are looking for, e.g., [Ee] means capital E OR lowercase e. I threw in titles that had the words systematic and natrualist too. Tried to trim any whitespace as well using the stringr package.

library<span class="p">(</span>stringr<span class="p">)</span>

ecotitles <span class="o"><-</span> as.character<span class="p">(</span>allnames<span class="p">[</span>str_detect<span class="p">(</span>allnames<span class="p">,</span> <span class="s">"^[Ee]cology|\\s[Ee]cology"</span><span class="p">)])</span>
evotitles <span class="o"><-</span> as.character<span class="p">(</span>allnames<span class="p">[</span>str_detect<span class="p">(</span>allnames<span class="p">,</span> <span class="s">"^[Ee]volution|\\s[Ee]volution"</span><span class="p">)])</span>
systtitles <span class="o"><-</span> as.character<span class="p">(</span>allnames<span class="p">[</span>str_detect<span class="p">(</span>allnames<span class="p">,</span> <span class="s">"^[Ss]ystematic|\\s[Ss]systematic"</span><span class="p">)])</span>
naturalist <span class="o"><-</span> as.character<span class="p">(</span>allnames<span class="p">[</span>str_detect<span class="p">(</span>allnames<span class="p">,</span> <span class="s">"[Nn]aturalist"</span><span class="p">)])</span>

ecoevotitles <span class="o"><-</span> unique<span class="p">(</span>c<span class="p">(</span>ecotitles<span class="p">,</span> evotitles<span class="p">,</span> systtitles<span class="p">,</span> naturalist<span class="p">))</span>  <span class="c1"># combine to list</span>
ecoevotitles <span class="o"><-</span> str_trim<span class="p">(</span>ecoevotitles<span class="p">,</span> side <span class="o">=</span> <span class="s">"both"</span><span class="p">)</span>  <span class="c1"># trim whitespace, if any</span>
length<span class="p">(</span>ecoevotitles<span class="p">)</span>
[1] 188
<span class="c1"># Just the first ten titles</span>
ecoevotitles<span class="p">[</span><span class="m">1</span>:<span class="m">10</span><span class="p">]</span>
 [1] "Microbial Ecology in Health and Disease"           
 [2] "Population Ecology"                                
 [3] "Researches on Population Ecology"                  
 [4] "Behavioral Ecology and Sociobiology"               
 [5] "Microbial Ecology"                                 
 [6] "Biochemical Systematics and Ecology"               
 [7] "FEMS Microbiology Ecology"                         
 [8] "Journal of Experimental Marine Biology and Ecology"
 [9] "Applied Soil Ecology"                              
[10] "Forest Ecology and Management"

Get the .Rmd file used to create this post at my github account.


Written in Markdown, with help from knitr, and nice knitr highlighting/etc. in in RStudio.

To leave a comment for the author, please follow the link and comment on their blog: Recology - R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)