R tips for moderately large data

[This article was first published on Robert Grant's stats blog » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Some useful tips recently featured on r-bloggers and originally posted at Mollie’s Research Blog are worth reading. I say moderately large because I don’t really believe there is such a thing as big data (and it looks like Mollie doesn’t either, judging by the judicious use of the word ‘large’), but there are special computational problems that appear as you go large. Maybe in ten years we’ll laugh at those problems but I suspect the data will have kept pace just ahead of our capabilities.

For example, did you know that by specifying the class of each variable (string, integer and so on) when opening a file in R, you can cut the time taken nearly in half? I certainly didn’t. What about not bothering to open it at all if it’s already in memory? That’s a good idea too. I’ll be keeping an eye on the blog for more top tips.

It would be interesting to see how many of these have parallels in other stats software.


To leave a comment for the author, please follow the link and comment on their blog: Robert Grant's stats blog » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)