In Reply to Ben Bolker’s Post "Google Scholar (still) sucks"

November 14, 2011

(This article was first published on theBioBucket*, and kindly contributed to R-bloggers)

Replying to Ben Bolker’s post Google Scholar (still) sucks:


thanks for illustrating the issue in your post!

The main purpose of my function GScholarScraper is to retrieve titles – just because this is the best we can get from Google Scholar. Abstracts are truncated and thus shouldn’t be used for meta-analysis. Also titles are truncated, as you said, and there is no way around. Though, this is not as often and severe as with abstracts, i.e.

The CSV is optional, the df with word frequencies and the word cloud are always returned – for any other output one can easily add some appropriate lines to the script.

My opinion:
My function is good for a quick summary and illustration of a query-result.

Tony’s function is evidently better if you want to pull all fields of a given query (authors, titles, abstracts,..)

I wonder if people came across ROpenSci? I guess that might be very interesting in this context!

Last remark: Of course, a Google Scholar API would resolve all our problems in this regard..


To leave a comment for the author, please follow the link and comment on their blog: theBioBucket*. offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Tags: , ,

Comments are closed.


Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)