Site icon R-bloggers

Taking R to the Limit, Part II – Large Datasets in R

[This article was first published on Byte Mining » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
< !-- Start Shareaholic LikeButtonSetTop Automatic -->< !-- End Shareaholic LikeButtonSetTop Automatic -->

For Part I, Parallelism in R, click here.

Tuesday night I again had the opportunity to present on high performance computing in R, at the Los Angeles R Users’ Group. This was the second part of a two part series called “Taking R to the Limit: High Performance Computing in R.” Part II discussed ways to work with large datasets in R. I also tied in MapReduce into the talk. Unfortunately, there was too much material and I had originally planned to cover Rhipe, using R on EC2 and sparse matrix libraries.

Slides

My edited slides are posted on SlideShare, and available for download here.

Taking R to the Limit (High Performance Computing in R), Part 2 — Large Datasets, LA R Users' Group 8/17/10< embed name="__sse5016270" src="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=rhpc-100819231518-phpapp01&stripped_title=r-hpc" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="425" height="355">
View more presentations from Ryan Rosario.

Topics included:

Code

The corresponding demonstration code is here.

Data

Since this talk discussed large datasets, I used some, well, large datasets. Some demonstrations used toy data including trees and the famous iris dataset included in base R. To load these, just use the call library(iris) or library(trees).

Large datasets:

Video

< embed src="http://blip.tv/play/hoYTgfiHPQA%2Em4v" type="application/x-shockwave-flash" width="450" allowscriptaccess="always" allowfullscreen="true">

The video was created with Vara ScreenFlow and I am very happy with how easy it is to use and how painless editing was.

For Part I, Parallelism in R, click here.

< !-- Start Shareaholic LikeButtonSetBottom Automatic -->
< !-- End Shareaholic LikeButtonSetBottom Automatic -->

To leave a comment for the author, please follow the link and comment on their blog: Byte Mining » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.