R tips for moderately large data

September 16, 2013

(This article was first published on Robert Grant's stats blog » R, and kindly contributed to R-bloggers)

Some useful tips recently featured on r-bloggers and originally posted at Mollie’s Research Blog are worth reading. I say moderately large because I don’t really believe there is such a thing as big data (and it looks like Mollie doesn’t either, judging by the judicious use of the word ‘large’), but there are special computational problems that appear as you go large. Maybe in ten years we’ll laugh at those problems but I suspect the data will have kept pace just ahead of our capabilities.

For example, did you know that by specifying the class of each variable (string, integer and so on) when opening a file in R, you can cut the time taken nearly in half? I certainly didn’t. What about not bothering to open it at all if it’s already in memory? That’s a good idea too. I’ll be keeping an eye on the blog for more top tips.

It would be interesting to see how many of these have parallels in other stats software.

To leave a comment for the author, please follow the link and comment on their blog: Robert Grant's stats blog » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.


Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)