Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Scholar indices are intended to measure the contributions of authors to their fields of research. Jorge E. Hirsch suggested the h-index in 2005 as an author-level metric intended to measure both the productivity and citation impact of the publications of an author. An author has index h if h of his or her N papers have at least h citations each, and the other (N-h) papers have no more than h citations each.

In response to a comment, we will use our trusty RISmed package and the PubMed database to develop a script for calculating an h-index, as well as two similar metrics, the m-quotient, and g-index. Here is the code to conduct the search, the citations information is stored in the EUtilitiesSummary() as Cited().

x <- "Yi-Kuo Yu"
res <- EUtilsSummary(x, type="esearch", db="pubmed", datetype='pdat', mindate=1900, maxdate=2015, retmax=500)
citations <- Cited(res)
citations <- as.data.frame(citations)


## h-index

Calculating the h-index is just a matter of cleverly arranging the data. Above, we created a data frame with one column containing all the values of Cited() in our search. We will sort them in descending order, then make a new column with the index values. The highest index value that is greater than the number of citations is that author’s h-index. The following code will return that index number.

citations <- citations[order(citations$citations,decreasing=TRUE),] citations <- as.data.frame(citations) citations <- cbind(id=rownames(citations),citations) citations$id<- as.character(citations$id) citations$id<- as.numeric(citations$id) hindex <- max(which(citations$id<=citations$citations)) hindex 12  Here is the data frame we created above that shows that Dr. Yi-Kuo Yu has an h-index of 12, since he has 12 publications with 12 or more citations. citations id citations 1 181 2 62 3 34 4 31 5 23 6 19 7 19 8 18 9 14 10 14 11 13 12 13 13 10 14 8  ## m-quotient Although the h-index is a useful metric to measure an author’s impact, it has some disadvantages. For instance, a long, less impactful career will typically outscore a superstar junior scientist. For these cases, the m-quotient divides the h-index by the number of years since the author’s first publication. In this sense it is a way to normalize by career span. y <- YearPubmed(EUtilsGet(res)) low <- min(y) high <- max(y) den <- high-low mquotient <- hindex/den mquotient 0.92  ## g-index Another weakness of the h-index is that doesn’t take into account highly cited publications. It doesn’t matter if an author has a few highly cited publications, he gets the same h-index as a relatively obscure author. The g-index was developed to address this situation. The g-index is the largest rank (where papers are arranged in decreasing order of the number of citations they received) such that the first g papers have (together) at least g^2 citations”. Here is code to calculate the g-index. citations$square <- citations$id^2 citations$sums <- cumsum(citations$citations) gindex <- max(which(citations$square<citations\$sums))

gindex
22


We made two new columns, one for the squares of the index column and one for the cumulative sum of the citations in descending order. Similar to the h-index, we need the index of the highest squared index value that is less than the cumulative sum. Our output with the two new columns below shows that Dr. Yu has a g-score of 22, based on the fact that especially his top two publications have many citations.

citations

id citations square sums
1       181      1  181
2        62      4  243
3        34      9  277
4        31     16  308
5        23     25  331
6        19     36  350
7        19     49  369
8        18     64  387
9        14     81  401
10        14    100  415
11        13    121  428
12        13    144  441
13        10    169  451
14         8    196  459
15         7    225  466
16         7    256  473
17         7    289  480
18         7    324  487
19         7    361  494
20         7    400  501
21         6    441  507
22         5    484  512
23         4    529  516
24         4    576  520


Check out the updated Shiny App to let the App do the work for you.