Posts Tagged ‘ web scraping ’

Web-Scraping in R

April 2, 2012
By
Web-Scraping in R

Web-scraping, or web-crawling, sounds like a seedy activity worthy of an Interpol investigative department. The reality, however, is far less nefarious. Web-scraping is any procedure by which someone extracts data from the internet. Given that it’s possible to get the internet on computers these days; web-scrapping opens an array of interesting possibilities to social-science researchers

Read more »

An unabashedly narcissistic data analysis of my own tweets. The…

April 2, 2012
By
An unabashedly narcissistic data analysis of my own tweets.
The…

pie( table( whence.i.tweet )) qplot( whence ) + coord_polar() pie( log( table( whence )))+RColorBrewer ggplot (see below) plot( density( tweets.len )) qplot(... stat="density") + geom_density qplot(...stat="bin") + geom_text(...) tweeple tweep...

Read more »

Playing with XML-Package: Get No. of Google Search Hits with R

March 30, 2012
By
Playing with XML-Package: Get No. of Google Search Hits with R

GoogleHits <- function(input) { require(XML) require(stringr) require(RCurl) url

Read more »

Scraping table from any web page with R or CloudStat

January 15, 2012
By
Scraping table from any web page with R or CloudStat

Scraping table from any web page with R or CloudStat: You need to use the data from internet, but don’t type, you can just extract or scrape them if you know the web URL. Thanks to XML package from R. It provides amazing readHTMLtable() function. For...

Read more »

An R function to analyze your Google Scholar Citations page

November 23, 2011
By
An R function to analyze your Google Scholar Citations page

Google scholar has now made Google Scholar Citations profiles available to anyone. You can read about these profiles and set one up for yourself here. I asked John Muschelli and Andrew Jaffe to write me a function that would download my Google Scholar...

Read more »

Popular Baby Names Walk-Through Part 2 – Graphing the fast movers

November 21, 2011
By
Popular Baby Names Walk-Through Part 2 – Graphing the fast movers

I will assume you have read through part 1 and have the csv file loaded. While we covered some basic graphing in the last post i hope to get into a little more of the data crunching. Specifically I am interested in the names which where driven by a spe...

Read more »

GScholarXScraper: Hacking the GScholarScraper function with XPath

November 13, 2011
By
GScholarXScraper: Hacking the GScholarScraper function with XPath

Kay Cichini recently wrote a word-cloud R function called GScholarScraper on his blog which when given a search string will scrape the associated search results returned by Google Scholar, across pages, and then produce a word-cloud visualisation. This was of interest to me because around the same time I posted an independent Google Scholar scraper function  get_google_scholar_df()

Read more »

R-Function GScholarScraper to Webscrape Google Scholar Search Result

November 9, 2011
By
R-Function GScholarScraper to Webscrape Google Scholar Search Result

Based on my previous post on Web Scraping I coded and uploaded the Function "GScholarScraper" HERE for testing!The function will pull all (!) results, processing pages in chunks of 100 results/titles, and return a file with all titles, links, etc. It w...

Read more »

Web Scraping Google Scholar: Part 2 (Complete Success)

November 8, 2011
By
Web Scraping Google Scholar: Part 2 (Complete Success)

This is a followup to a post I uploaded earlier today about web scraping data off Google Scholar. In that post I was frustrated because I’m not smart enough to use xpathSApply to get the kind of results I wanted. However fast-forward to the evening whilst having dinner with a friend, as a passing remark,

Read more »

Web Scraping Google Scholar (Partial Success)

November 8, 2011
By

I wanted to scrape the information returned by a Google Scholar web search into an R data frame as a quick XPath exercise. The following will successfully extract  the ‘title’, ‘url’ , ‘publication’ and ‘description’.  If any of these fields are not available, as in the case of a citation, the corresponding cell in the data

Read more »