New R package: scholar

October 23, 2013
By

[This article was first published on James Keirstead » Rstats, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

My new R package, scholar, has just been posted on CRAN.

The scholar package provides functions to extract citation data from Google Scholar. In addition to retrieving basic information about a single scholar, the package also allows you to compare multiple scholars and predict future h-index values. There’s a full guide on Github (along with the source code), but here are some quick highlights.

Get profile data on a scholar

Not everyone has a Google Scholar profile page, but if they do, you can find them by searching in the corner of a profile page. The resulting URL will contain a string that looks like user=B7vSqZsAAAAJ. To use the package, we need to reference scholars by that id. So, for example, here is Richard Feynman’s data:

library(scholar)
id < - 'B7vSqZsAAAAJ'
feynman <- get_profile(id)
feynman$name # Prints out his name

Compare multiple scholars

You can also compare multiple scholars, for example, a Feynman/Hawking battle royale:

# Compare Richard Feynman and Stephen Hawking
ids < - c('B7vSqZsAAAAJ', 'qj74uXkAAAAJ')

# Compare their career trajectories, based on year of first citation
df <- compare_scholar_careers(ids)
ggplot(df, aes(x=career_year, y=cites)) + geom_line(aes(linetype=name)) + theme_bw()

Citation histories of Richard Feynman and Stephen Hawking

Citation histories of Richard Feynman and Stephen Hawking

Predicting future h-index values

A scholar’s h-index is n if they have published at least n papers that have been cited at least n times each. Acuna et al. published a method for predicting future h-index values based on historical citation rates. The original regressions were calibrated on neuroscience researchers so using this in other fields may well end up predicting negative h-indices. However there is an optional argument that allows you to re-define the ‘top’ journals in your field. No guarantees, but still, it’s a bit of fun.


## Predict Daniel Acuna's h-index
id < - 'GAi23ssAAAAJ'
predict_h_index(id)

That’s it! If you have any suggestions for new features, comments, etc, please let me know.

To leave a comment for the author, please follow the link and comment on their blog: James Keirstead » Rstats.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)