Using R to get h5-index for a list of journals

[This article was first published on Minding the Brain, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

In my last blog post I wrote about impact factors and h-index for different journals. That got me wondering about what the h5 index is for all of the journals that I read and may want to publish in. I could look them all up individually, but that sounds boring and monotonous. I’d much rather figure out how to get R to do it for me. I’ve never done this kind of thing with R before, so it took a little while, but I wrote a simple function that takes a journal name and returns its h5-index. 

You have to spell the journal’s title exactly as it appears in Google Scholar to make sure you get the right h5 index, but otherwise it seems to be reasonably robust. Here it is:

getH5 <- function(journal.name){
  require(RCurl)
  require(XML)
  require(stringr)
  #replace spaces in journal name with "+" for google search
  search.str <- str_replace_all(journal.name, " ", "+")
  #make the URL
url <- paste("http://scholar.google.com/citations?hl=en&view_op=search_venues&vq=", search.str, sep = "")
  #retrieve the webpage from the URL
  webpage <- getURL(url)
  #pull the table of h5 values out of the page
  x <- readHTMLTable(webpage)
  #get the h5-index for the specified journal (there are likely to be other partial matches), use NA if the journal name is not found
  if(length(x)==0){
    h5 <- NA
    warning("Could not find any publications matching ", journal.name, immediate.=T)
  } else {
    tab <- x[[1]]
    h5 <- as.numeric(as.character(tab$"h5-index"[tab$Publication == journal.name]))
    if(length(h5)==0) h5 <- NA
  }
  #arrange in a convenient form
  dat <- data.frame(Journal = journal.name, H5 = h5)
  return(dat)
}


The function is designed for a single journal name, so for my set of journals, I used the adply function from the plyr package:

jh <- adply(journals, 1, "getH")
jh.sorted <- jh[order(-jh$H5),] #sort them by decreasing h5-index


After a little clean-up, I got what I was looking for:


Journal H5
PLoS ONE 120
Nature Neuroscience 115
Brain 103
Trends in Cognitive Sciences 86
Psychological Science 79
Neuropsychologia 67
Journal of Cognitive Neuroscience 64
Psychological Review 62
Cognition 60
Neuroscience Letters 50
Experimental Brain Research 46
The Journal of the Acoustical Society of America 46
Psychonomic Bulletin & Review 45
Journal of Experimental Psychology: General 44
Biological Psychology 44
Journal of Memory and Language 43
Journal of Experimental Psychology: Learning, Memory, and Cognition 43
Journal of Experimental Psychology: Human Perception and Performance 42
Cortex; a journal devoted to the study of the nervous system and behavior 41
Psychology and Aging 41
Brain and Language 39
Brain and Cognition 39
Cognitive Science 38
BMC Neuroscience 38
The Quarterly Journal of Experimental Psychology 36
Ear and Hearing 35
Cognitive Psychology 34
Acta Psychologica 34
Cognitive, Affective, & Behavioral Neuroscience 33
Memory & Cognition 32
Frontiers in Human Neuroscience 29
Attention, Perception, & Psychophysics 28
Language and Cognitive Processes 27
Topics in Cognitive Science 23
Cognitive Neuropsychology 21
Cognitive and Behavioral Neurology 18
Neurocase 18
Cognitive neuroscience 10

To leave a comment for the author, please follow the link and comment on their blog: Minding the Brain.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)