Getting ecology and evolution journal titles from R

Posted on August 31, 2012 by Recology - R in R bloggers | 0 Comments

[This article was first published on Recology - R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

So I want to mine some #altmetrics data for some research I’m thinking about doing. The steps would be:

Get journal titles for ecology and evolution journals.
Get DOI’s for all papers in all the above journal titles.
Get altmetrics data on each DOI.
Do some fancy analyses.
Make som pretty figs.
Write up results.

It’s early days, so jus working on the first step. However, getting a list of journals in ecology and evolution is frustratingly hard. This turns out to not be that easy if you are (1) trying to avoid Thomson Reuters, and (2) want a machine interface way to do it (read: API).

Unfortunately, Mendeley’s API does not have methods for getting a list of journals by field, or at least I don’t know how to do it using their API. No worries though – Crossref comes to save the day. Here’s my attempt at this using the Crossref OAI-PMH.

I wrote a little while loop to get journal titles from the Crossref OAI-PMH. This takes a while to run, but at least it works on my machine – hopefully yours too!

library(XML)
library(RCurl)

token <- "characters"  # define a iterator, also used for gettingn the resumptionToken
nameslist <- list()  # define empty list to put joural titles in to
while (is.character(token) == TRUE) {
    baseurl <- "http://oai.crossref.org/OAIHandler?verb=ListSets"
    if (token == "characters") {
        tok2 <- NULL
    } else {
        tok2 <- paste("&resumptionToken=", token, sep = "")
    }
    query <- paste(baseurl, tok2, sep = "")
    crsets <- xmlToList(xmlParse(getURL(query)))
    names <- as.character(sapply(crsets[[4]], function(x) x[["setName"]]))
    nameslist[[token]] <- names
    if (class(try(crsets[[2]]$.attrs[["resumptionToken"]])) == "try-error") {
        stop("no more data")
    } else token <- crsets[[2]]$.attrs[["resumptionToken"]]
}

Yay! Hopefully it worked if you tried it. Let's see how long the list of journal titles is.

sapply(nameslist, length)  # length of each list

                          characters c65ebc3f-b540-4672-9c00-f3135bf849e3 
                               10001                                10001 
6f61b343-a8f4-48f1-8297-c6f6909ca7f7 
                                6864

allnames <- do.call(c, nameslist)  # combine to list
length(allnames)

[1] 26866

Now, let's use some `regex` to pull out the journal titles that are likely ecology and evolutionary biology journals. The `^` symbol says "the string must start here". The `\\s` means whitespace. The `[]` lets you specify a set of letters you are looking for, e.g., `[Ee]` means capital `E` OR lowercase `e`. I threw in titles that had the words systematic and natrualist too. Tried to trim any whitespace as well using the `stringr` package.

library(stringr)

ecotitles <- as.character(allnames[str_detect(allnames, "^[Ee]cology|\\s[Ee]cology")])
evotitles <- as.character(allnames[str_detect(allnames, "^[Ee]volution|\\s[Ee]volution")])
systtitles <- as.character(allnames[str_detect(allnames, "^[Ss]ystematic|\\s[Ss]systematic")])
naturalist <- as.character(allnames[str_detect(allnames, "[Nn]aturalist")])

ecoevotitles <- unique(c(ecotitles, evotitles, systtitles, naturalist))  # combine to list
ecoevotitles <- str_trim(ecoevotitles, side = "both")  # trim whitespace, if any
length(ecoevotitles)

[1] 188

# Just the first ten titles
ecoevotitles[1:10]

 [1] "Microbial Ecology in Health and Disease"           
 [2] "Population Ecology"                                
 [3] "Researches on Population Ecology"                  
 [4] "Behavioral Ecology and Sociobiology"               
 [5] "Microbial Ecology"                                 
 [6] "Biochemical Systematics and Ecology"               
 [7] "FEMS Microbiology Ecology"                         
 [8] "Journal of Experimental Marine Biology and Ecology"
 [9] "Applied Soil Ecology"                              
[10] "Forest Ecology and Management"

Get the .Rmd file used to create this post at my github account.

Written in Markdown, with help from knitr, and nice knitr highlighting/etc. in in RStudio.

To leave a comment for the author, please follow the link and comment on their blog: Recology - R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

R-bloggers

R news and tutorials contributed by hundreds of R bloggers

Getting ecology and evolution journal titles from R

I wrote a little while loop to get journal titles from the Crossref OAI-PMH. This takes a while to run, but at least it works on my machine – hopefully yours too!

Yay! Hopefully it worked if you tried it. Let's see how long the list of journal titles is.

Get the .Rmd file used to create this post at my github account.

Written in Markdown, with help from knitr, and nice knitr highlighting/etc. in in RStudio.

Related

I wrote a little while loop to get journal titles from the Crossref OAI-PMH. This takes a while to run, but at least it works on my machine – hopefully yours too!

Yay! Hopefully it worked if you tried it. Let's see how long the list of journal titles is.

Get the .Rmd file used to create this post at my github account.

Written in Markdown, with help from knitr, and nice knitr highlighting/etc. in in RStudio.

Related

Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts. (You will not see this message again.)

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)