The R-Podcast Episode 10: Adventures in Data Munging Part 2

September 16, 2012
By

(This article was first published on The R-Podcast, and kindly contributed to R-bloggers)

I’m happy to present episode 10 of the R-Podcast! Season 1 of the R-Podcast concludes with part 2 of my series on data munging, in which I discuss issues surrounding importing data sets contained in HTML tables. I share how I used the XML and RCurl packages to validate and import data from hockey-reference.com for storage into a MySQL database. Our listener feedback segment contains another installment on the Pitfalls of R contributed by listener Frans. I want to thank everyone who has provided such positive feedback throughout the season, and I’m looking forward to providing some exciting new content for season 2. I hope you enjoy the episode and check out our new contact page if you would like to provide any feedback. Thanks for listening!

The following resources are mentioned in this episode:

Episode 10 Time Stamps

00:00 The R-Podcast #010 Adventures in Data Munging Part 2
00:33 Introduction
01:50 Wrapping up season 1 ... wait, what?
03:30 Rstudio team expands
05:41 R Community milestone
07:53 Discovering hockey-reference.com 
10:54 Tips for readHTMLtable
21:10 Checking for valid data first
29:23 Minor processing needed
35:18 Saving data to MySQL database
45:26 Listener Feedback: Andrew
54:58 Frans: Pitfalls of R segment 2
63:40 Wrapping up: subscribe to the podcast, [email protected], + 1-269-849-9780, Twitter @theRcast
69:14 End

To leave a comment for the author, please follow the link and comment on his blog: The R-Podcast.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.