132 search results for "web scraping"

The R-Podcast Episode 10: Adventures in Data Munging Part 2

September 16, 2012
By

I’m happy to present episode 10 of the R-Podcast! Season 1 of the R-Podcast concludes with part 2 of my series on data munging, in which I discuss issues surrounding importing data sets contained in HTML tables. I share how I used the XML and RCurl packages to validate and import data from hockey-reference.com for

Read more »

UseR 2012 highlights

June 20, 2012
By
UseR 2012 highlights

The eighth annual R user conference, UseR! 2012, has come and gone — and what an event it was! I've been to five useR! conferences so far, and each one improves upon the last. This year's conference at Vanderbilt was the best so far: an outstanding location (my first visit to Nashville, a great city), excellent facilities (the lecture...

Read more »

Visualizing the CRAN: Graphing Package Dependencies

May 17, 2012
By
Visualizing the CRAN:  Graphing Package Dependencies

I had been meaning to start toying with the igraph package for a while. So a few weeks ago (lay off, I'm busy), I decided to grab a bunch of CRAN data about package dependencies. The easiest way that I could think to get this information was to just grab the html files for all the package descriptions and...

Read more »

118 years of US State Weather Data

April 22, 2012
By
118 years of US State Weather Data

A recent post on the Junkcharts blog looked at US weather dataand the importance of explaining scales (which in this case went up to 118). Ultimately, it turns out that 118 is the rank of the data compared to the previous 117 years of data (in ascending order, so that 118 is the highest). At … Continue reading...

Read more »

The 50 most used R packages

April 5, 2012
By
The 50 most used R packages

Ask anyone what makes R a great language, one argument that often comes back is its very active community. Proof is the impressive number of packages contributed by developers from all horizons and backgrounds. The CRAN website alone lists 3,725 p...

Read more »

RStudio Development Environment

March 23, 2012
By
RStudio Development Environment

Compared to many other languages of equal popularity, there are realtively few development environments for R. In fact, the total number of production ready R IDEs could probably be counted on one hand. That deficiency is a small price to pay to use R and if you’re not already accustomed to using IDEs for other The post RStudio...

Read more »

R: A Quick Scrape of Top Grossing Films from boxofficemojo.com

January 13, 2012
By
R: A Quick Scrape of Top Grossing Films from boxofficemojo.com

  Introduction I was looking at a list of the top grossing films of all time (available from boxofficemojo.com) and was wondering what kind of graphs I would come up with if I had that kind of data. I still don’t know what kind of graphs I’d construct other than a simple barplot but figured

Read more »

Installing quantstrat from R-forge and source

January 10, 2012
By
Installing quantstrat from R-forge and source

R is used extensively in the financial industry; many of my recent clients have been working in or developing products for the financial sector. Some common applications are to use R to analyze market data and evaluate quantitative trading strategies. Custom solutions are almost always the best way to do this, but the quantstrat package The post Installing...

Read more »

Analyzing R-bloggers

January 6, 2012
By
Analyzing R-bloggers

In the last two posts we saw how to download posts from R-bloggers, and then extract the title, author and date of each post and write that information to a csv file. Since we now have a nice data set from r-bloggers, we can start to examine the develo...

Read more »

Mapping the Iowa GOP 2012 Caucus Results

January 4, 2012
By
Mapping the Iowa GOP 2012 Caucus Results

Introduction On Tuesday January 3rd 2012 the Iowa Republican party held it’s presidential caucuses, with Mitt Romney beating Rick Santorum by 8 votes as of noon on Jan 4th. This was an exciting race with multiple lead changes and entrance polling showing many late undecideds and large gaps in candidate support by age and income.

Read more »