# Taking R to the Limit, Part II – Large Datasets in R

**Byte Mining » R**, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

**For Part I, Parallelism in R, click here.**

Tuesday night I again had the opportunity to present on high performance computing in R, at the Los Angeles R Users’ Group. This was the second part of a two part series called “Taking R to the Limit: High Performance Computing in R.” Part II discussed ways to work with large datasets in R. I also tied in MapReduce into the talk. Unfortunately, there was too much material and I had originally planned to cover Rhipe, using R on EC2 and sparse matrix libraries.

**Slides**

My edited slides are posted on SlideShare, and available for download here.

Topics included:

- bigmemory, biganalytics and bigtabulate
- ff
- HadoopStreaming
- brief mention of Rhipe

**Code**

The corresponding demonstration code is here.

**Data**

Since this talk discussed large datasets, I used some, well, large datasets. Some demonstrations used toy data including `trees` and the famous `iris` dataset included in base R. To load these, just use the call `library(iris)` or `library(trees)`.

Large datasets:

- On-Time Airline Performance data from 2009 Data Expo. This Bash script will download all of the necessary data files and create a nice dataset for you called
`airline.csv`in the directory in which it is executed. I would just post it here, but it is very large and I only have so much bandwidth! - The Twitter dataset appears to no longer be available. Instead, use
`anna.txt`which comes with`HadoopStreaming`. Simply replace`twitter.tsv`with`anna.txt`.

**Video**

The video was created with Vara ScreenFlow and I am very happy with how easy it is to use and how painless editing was.

**For Part I, Parallelism in R, click here.**

**leave a comment**for the author, please follow the link and comment on their blog:

**Byte Mining » R**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.