New R package: scholar

My new R package, scholar, has just been posted on CRAN.

The scholar package provides functions to extract citation data from Google Scholar. In addition to retrieving basic information about a single scholar, the package also allows you to compare multiple scholars and predict future h-index values. There’s a full guide on Github (along with the source code), but here are some quick highlights.

Get profile data on a scholar

Not everyone has a Google Scholar profile page, but if they do, you can find them by searching in the corner of a profile page. The resulting URL will contain a string that looks like user=B7vSqZsAAAAJ. To use the package, we need to reference scholars by that id. So, for example, here is Richard Feynman’s data:


id < - 'B7vSqZsAAAAJ' feynman

Compare multiple scholars

You can also compare multiple scholars, for example, a Feynman/Hawking battle royale:

# Compare Richard Feynman and Stephen Hawking

ids < - c('B7vSqZsAAAAJ', 'qj74uXkAAAAJ') # Compare their career trajectories, based on year of first citation df

Citation histories of Richard Feynman and Stephen Hawking

Predicting future h-index values

A scholar's h-index is n if they have published at least n papers that have been cited at least n times each. Acuna et al. published a method for predicting future h-index values based on historical citation rates. The original regressions were calibrated on neuroscience researchers so using this in other fields may well end up predicting negative h-indices. However there is an optional argument that allows you to re-define the 'top' journals in your field. No guarantees, but still, it's a bit of fun.

## Predict Daniel Acuna's h-index

id < - 'GAi23ssAAAAJ' predict_h_index(id)

That's it! If you have any suggestions for new features, comments, etc, please let me know.

