Based on my previous post on Web Scraping I coded and uploaded the Function “GScholarScraper” HERE for testing!
The function will pull all (!) results, processing pages in chunks of 100 results/titles, and return a file with all titles, links, etc. It will also produce a word cloud using the words in the publication titles.
Please try your own search strings and report errors, etc.!
You can source the function by running the following lines:
setwd(tempdir()) download.file("http://docs.google.com/uc?export=download&id=0B2wAunwURQNsM2EyYWNjOWYtZmFkMi00MmJhLWJmMzUtMjRiNGFiMWVkZmI2", destfile = "google_docs_script.txt", mode = "wb") # read it and run an example: source(paste(tempdir(), "/google_docs_script.txt", sep = "")) ls() # the function should be listed # remove files from tempdir: unlink(dir())
Build and run properly under:
R version 2.13.0 (2011-04-13) and R version R-2.13.2 (2011-09-30)
Platform: i386-pc-mingw32/i386 (32-bit)locale:
 LC_COLLATE=English_United States.1252
 LC_CTYPE=English_United States.1252
 LC_MONETARY=English_United States.1252
 LC_TIME=English_United States.1252
attached base packages:
 stats graphics grDevices utils datasets methods base
other attached packages:
 stringr_0.5 tm_0.5-6 wordcloud_1.2 Rcpp_0.9.7
loaded via a namespace (and not attached):
 plyr_1.5.1 slam_0.1-23
PS: Errors reported lately (see comments) were resolved, the source code was updated..