Scraping table from html web with CloudStat

January 12, 2012
By

(This article was first published on CloudStat, and kindly contributed to R-bloggers)

You need to use the data from internet, but don’t type, you can just extract or scrape them if you know the web URL.

Thanks to XML package from R. It provides amazing readHTMLtable() function.

For a study case,

I want to scrape data:

  1. US Airline Customer Score.
  2. World Top Chess Players (Men).

A. Scraping US Airline Customer Score table from
http://www.theacsi.org/index.php?option=com_content&view=article&id=147&catid=&Itemid=212&i=Airlines

Code:

airline = ‘http://www.theacsi.org/index.php?option=com_content&view=article&id=147&catid=&Itemid=212&i=Airlines’
airline.table = readHTMLTable(airline, header=T, which=1,stringsAsFactors=F)

Result:

B. Scraping World Top Chess players (Men) table from http://ratings.fide.com/top.phtml?list=men

Code:

chess = ‘http://ratings.fide.com/top.phtml?list=men’
chess.table = readHTMLTable(chess, header=T, which=5,stringsAsFactors=F)

Result:

Done. You had successfully scraping data from any web page with CloudStat.

You can get the full version of this study case (code and result) at Scraping table from html web.

Then, you can analyze as usual! Great! No more retype the data. Enjoy!

To leave a comment for the author, please follow the link and comment on his blog: CloudStat.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Tags: , , , , ,

Comments are closed.