New R package: scholar

October 23, 2013

(This article was first published on James Keirstead » Rstats, and kindly contributed to R-bloggers)

My new R package, scholar, has just been posted on CRAN.

The scholar package provides functions to extract citation data from Google Scholar. In addition to retrieving basic information about a single scholar, the package also allows you to compare multiple scholars and predict future h-index values. There’s a full guide on Github (along with the source code), but here are some quick highlights.

Get profile data on a scholar

Not everyone has a Google Scholar profile page, but if they do, you can find them by searching in the corner of a profile page. The resulting URL will contain a string that looks like user=B7vSqZsAAAAJ. To use the package, we need to reference scholars by that id. So, for example, here is Richard Feynman’s data:

id < - 'B7vSqZsAAAAJ'
feynman <- get_profile(id)
feynman$name # Prints out his name

Compare multiple scholars

You can also compare multiple scholars, for example, a Feynman/Hawking battle royale:

# Compare Richard Feynman and Stephen Hawking
ids < - c('B7vSqZsAAAAJ', 'qj74uXkAAAAJ')

# Compare their career trajectories, based on year of first citation
df <- compare_scholar_careers(ids)
ggplot(df, aes(x=career_year, y=cites)) + geom_line(aes(linetype=name)) + theme_bw()

Citation histories of Richard Feynman and Stephen Hawking

Citation histories of Richard Feynman and Stephen Hawking

Predicting future h-index values

A scholar’s h-index is n if they have published at least n papers that have been cited at least n times each. Acuna et al. published a method for predicting future h-index values based on historical citation rates. The original regressions were calibrated on neuroscience researchers so using this in other fields may well end up predicting negative h-indices. However there is an optional argument that allows you to re-define the ‘top’ journals in your field. No guarantees, but still, it’s a bit of fun.

## Predict Daniel Acuna's h-index
id < - 'GAi23ssAAAAJ'

That’s it! If you have any suggestions for new features, comments, etc, please let me know.

To leave a comment for the author, please follow the link and comment on their blog: James Keirstead » Rstats. offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)