Blog Archives

GScholarXScraper: Hacking the GScholarScraper function with XPath

November 13, 2011
By
GScholarXScraper: Hacking the GScholarScraper function with XPath

Kay Cichini recently wrote a word-cloud R function called GScholarScraper on his blog which when given a search string will scrape the associated search results returned by Google Scholar, across pages, and then produce a word-cloud visualisation. This was of interest to me because around the same time I posted an independent Google Scholar scraper function  get_google_scholar_df()

Read more »

Web Scraping Google+ via XPath

November 11, 2011
By
Web Scraping Google+ via XPath

Google+ just opened up to allow brands, groups, and organizations to create their very own public Pages on the site. This didn’t bother me to much but I’ve been hearing a lot about Google+ lately so figured it might be fun to set up an XPath scraper to extract information from each post of a status

Read more »

Web Scraping Yahoo Search Page via XPath

November 10, 2011
By
Web Scraping Yahoo Search Page via XPath

Seeing as I’m on a bit of an XPath kick as of late, I figured I’d continue on scraping search results but this time from Yahoo.com Rolling my own version of xpathSApply to handle NULL elements seems to have done the trick and so far it’s been relatively easy to do the scraping. I’ve created

Read more »

Facebook Graph API Explorer with R

November 10, 2011
By
Facebook Graph API Explorer with R

I wanted to play around with the Facebook Graph API  using the Graph API Explorer page as a coding exercise. This facility allows one to use the API with a temporary authorisation token. Now, I don’t know how to make an R package for the proper API where you have to register for an API key and

Read more »

Web Scraping Google Scholar: Part 2 (Complete Success)

November 8, 2011
By
Web Scraping Google Scholar: Part 2 (Complete Success)

This is a followup to a post I uploaded earlier today about web scraping data off Google Scholar. In that post I was frustrated because I’m not smart enough to use xpathSApply to get the kind of results I wanted. However fast-forward to the evening whilst having dinner with a friend, as a passing remark,

Read more »

Web Scraping Google Scholar (Partial Success)

November 8, 2011
By

I wanted to scrape the information returned by a Google Scholar web search into an R data frame as a quick XPath exercise. The following will successfully extract  the ‘title’, ‘url’ , ‘publication’ and ‘description’.  If any of these fields are not available, as in the case of a citation, the corresponding cell in the data

Read more »

Web Scraping Google URLs

November 7, 2011
By
Web Scraping Google URLs

Google slightly changed the html code it uses for hyperlinks on search pages last Thursday, thus causing one of my scripts to stop working. Thankfully, this is easily solved in R thanks to the XML package and the power and simplicity of XPath expressions: Lovely jubbly! P.S. I know that there is an API of

Read more »

Code Optimization: One R Problem, Eleven Solutions – Now Thirteen!

November 7, 2011
By
Code Optimization: One R Problem, Eleven Solutions – Now Thirteen!

Following up from my previous post “Code Optimisation: One R Problem, Ten Solutions – Now Eleven!” I figured out a twelfth solution after writing that blog post. Furthermore, half way through writing this blog post I figured out a thirteenth solution too. As a recap, the problem is taken from rwiki where the goal is to find

Read more »

Code Optimization: One R Problem, Ten Solutions – Now Eleven!

November 2, 2011
By
Code Optimization: One R Problem, Ten Solutions – Now Eleven!

Earlier this year I came across a rather interesting page about optimisation in R from rwiki. The goal was to find the most efficient code to produce strings which follow the pattern below given a single integer input n: From this we can see that the general pattern for n is: It is rather heart

Read more »

Code Optimization: One R Problem, Ten Solutions – Now Eleven!

November 1, 2011
By
Code Optimization: One R Problem, Ten Solutions – Now Eleven!

Earlier this year I came across a rather interesting page about optimisation in R from rwiki. The goal was to find the most efficient code to produce strings which follow the pattern below given a single integer input n: From this we can see that the general pattern for n is: It is rather heart

Read more »