GScholarXScraper: Hacking the GScholarScraper function with XPath

November 13, 2011 | 0 Comments

Kay Cichini recently wrote a word-cloud R function called GScholarScraper on his blog which when given a search string will scrape the associated search results returned by Google Scholar, across pages, and then produce a word-cloud visualisation. This was of interest to me because around the same time I posted ... [Read more...]

Web Scraping Google Scholar: Part 2 (Complete Success)

November 8, 2011 | 0 Comments

This is a followup to a post I uploaded earlier today about web scraping data off Google Scholar. In that post I was frustrated because I’m not smart enough to use xpathSApply to get the kind of results I wanted. However fast-forward to the evening whilst having dinner with ... [Read more...]

Web Scraping Google Scholar (Partial Success)

November 8, 2011 | 0 Comments

I wanted to scrape the information returned by a Google Scholar web search into an R data frame as a quick XPath exercise. The following will successfully extract  the ‘title’, ‘url’ , ‘publication’ and ‘description’.  If any of these fields are not available, as in the case of a citation, the ... [Read more...]

Web Scraping Google URLs

November 7, 2011 | 0 Comments

Google slightly changed the html code it uses for hyperlinks on search pages last Thursday, thus causing one of my scripts to stop working. Thankfully, this is easily solved in R thanks to the XML package and the power and simplicity of XPath expressions: Lovely jubbly! P.S. I know ... [Read more...]

