**DataScience+**, and kindly contributed to R-bloggers)

Scholar indices are intended to measure the contributions of authors to their fields of research. Jorge E. Hirsch suggested the h-index in 2005 as an author-level metric intended to measure both the productivity and citation impact of the publications of an author. An author has index h if h of his or her N papers have at least h citations each, and the other (N-h) papers have no more than h citations each.

In response to a comment, we will use our trusty RISmed package and the PubMed database to develop a script for calculating an h-index, as well as two similar metrics, the m-quotient, and g-index. Here is the code to conduct the search, the citations information is stored in the `EUtilitiesSummary()`

as `Cited()`

.

x <- "Yi-Kuo Yu" res <- EUtilsSummary(x, type="esearch", db="pubmed", datetype='pdat', mindate=1900, maxdate=2015, retmax=500) citations <- Cited(res) citations <- as.data.frame(citations)

##
h-index

Calculating the h-index is just a matter of cleverly arranging the data. Above, we created a data frame with one column containing all the values of `Cited()`

in our search. We will sort them in descending order, then make a new column with the index values. The highest index value that is greater than the number of citations is that author’s h-index. The following code will return that index number.

citations <- citations[order(citations$citations,decreasing=TRUE),] citations <- as.data.frame(citations) citations <- cbind(id=rownames(citations),citations) citations $id<- as.character(citations$id) citations $id<- as.numeric(citations$id) hindex <- max(which(citations$id<=citations$citations)) hindex12

Here is the data frame we created above that shows that Dr. Yi-Kuo Yu has an h-index of 12, since he has 12 publications with 12 or more citations.

citationsid citations 1 181 2 62 3 34 4 31 5 23 6 19 7 19 8 18 9 14 10 14 11 13 12 13 13 10 14 8

##
m-quotient

Although the h-index is a useful metric to measure an author’s impact, it has some disadvantages. For instance, a long, less impactful career will typically outscore a superstar junior scientist. For these cases, the m-quotient divides the h-index by the number of years since the author’s first publication. In this sense it is a way to normalize by career span.

y <- YearPubmed(EUtilsGet(res)) low <- min(y) high <- max(y) den <- high-low mquotient <- hindex/den mquotient0.92

## g-index

Another weakness of the h-index is that doesn’t take into account highly cited publications. It doesn’t matter if an author has a few highly cited publications, he gets the same h-index as a relatively obscure author. The g-index was developed to address this situation. The g-index is the largest rank (where papers are arranged in decreasing order of the number of citations they received) such that the first g papers have (together) at least g^2 citations”. Here is code to calculate the g-index.

citations$square <- citations$id^2 citations$sums <- cumsum(citations$citations) gindex <- max(which(citations$square22

We made two new columns, one for the squares of the index column and one for the cumulative sum of the citations in descending order. Similar to the h-index, we need the index of the highest squared index value that is less than the cumulative sum. Our output with the two new columns below shows that Dr. Yu has a g-score of 22, based on the fact that especially his top two publications have many citations.

citationsid citations square sums 1 181 1 181 2 62 4 243 3 34 9 277 4 31 16 308 5 23 25 331 6 19 36 350 7 19 49 369 8 18 64 387 9 14 81 401 10 14 100 415 11 13 121 428 12 13 144 441 13 10 169 451 14 8 196 459 15 7 225 466 16 7 256 473 17 7 289 480 18 7 324 487 19 7 361 494 20 7 400 501 21 6 441 507 22 5 484 512 23 4 529 516 24 4 576 520

Check out the updated Shiny App to let the App do the work for you.

**leave a comment**for the author, please follow the link and comment on their blog:

**DataScience+**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...