I have big data problems.

I need to analyze 100s of millions of rows of data and tried hard for 2 weeks to see if I can use R for this. My assessment so far from the experiments…

**1) **R is best for data that fits a computer’s RAM (so get more RAM if you can).

**2) **R can be used for datasets that don’t fit into RAM using Bigmemory and ff packages. However, this technique works well for datasets less than 15 GB. This is in line with the excellent analysis done by Ryan. Another good tutorial for Bigmemory.

**3) **If we need to analyze datasets larger than 15 GB, then SAS, MapReduce and RDBMS 🙁 seem like the only option as they store data on file system and access it as needed.

Since MapReduce implementations are clumsy and not business friendly yet, I wonder if its time to explore commercial analytics tools like SAS for big data analytics.

Can Stata, Matlab or RevolutionR analyse datasets in the range of 50 – 100GB effectively?

**References**

http://www.bytemining.com/2010/07/taking-r-to-the-limit-part-i-parallelization-in-r/

http://www.austinacl.blogspot.com (image)

*Related*

R-bloggers.com offers

**daily e-mail updates** about

R news and

tutorials on topics such as:

Data science,

Big Data, R jobs, visualization (

ggplot2,

Boxplots,

maps,

animation), programming (

RStudio,

Sweave,

LaTeX,

SQL,

Eclipse,

git,

hadoop,

Web Scraping) statistics (

regression,

PCA,

time series,

trading) and more...

If you got this far, why not

__subscribe for updates__ from the site? Choose your flavor:

e-mail,

twitter,

RSS, or

facebook...