My new R package, scholar, has just been posted on CRAN.
scholar package provides functions to extract citation data from Google Scholar. In addition to retrieving basic information about a single scholar, the package also allows you to compare multiple scholars and predict future h-index values. There’s a full guide on Github (along with the source code), but here are some quick highlights.
Get profile data on a scholar
Not everyone has a Google Scholar profile page, but if they do, you can find them by searching in the corner of a profile page. The resulting URL will contain a string that looks like
user=B7vSqZsAAAAJ. To use the package, we need to reference scholars by that id. So, for example, here is Richard Feynman’s data:
id < - 'B7vSqZsAAAAJ'
feynman <- get_profile(id)
feynman$name # Prints out his name
Compare multiple scholars
You can also compare multiple scholars, for example, a Feynman/Hawking battle royale:
# Compare Richard Feynman and Stephen Hawking
ids < - c('B7vSqZsAAAAJ', 'qj74uXkAAAAJ')
# Compare their career trajectories, based on year of first citation
df <- compare_scholar_careers(ids)
ggplot(df, aes(x=career_year, y=cites)) + geom_line(aes(linetype=name)) + theme_bw()
Predicting future h-index values
A scholar’s h-index is n if they have published at least n papers that have been cited at least n times each. Acuna et al. published a method for predicting future h-index values based on historical citation rates. The original regressions were calibrated on neuroscience researchers so using this in other fields may well end up predicting negative h-indices. However there is an optional argument that allows you to re-define the ‘top’ journals in your field. No guarantees, but still, it’s a bit of fun.
## Predict Daniel Acuna's h-index
id < - 'GAi23ssAAAAJ'
That’s it! If you have any suggestions for new features, comments, etc, please let me know.