Some useful tips recently featured on r-bloggers and originally posted at Mollie’s Research Blog are worth reading. I say moderately large because I don’t really believe there is such a thing as big data (and it looks like Mollie doesn’t either, judging by the judicious use of the word ‘large’), but there are special computational problems that appear as you go large. Maybe in ten years we’ll laugh at those problems but I suspect the data will have kept pace just ahead of our capabilities.
For example, did you know that by specifying the class of each variable (string, integer and so on) when opening a file in R, you can cut the time taken nearly in half? I certainly didn’t. What about not bothering to open it at all if it’s already in memory? That’s a good idea too. I’ll be keeping an eye on the blog for more top tips.
It would be interesting to see how many of these have parallels in other stats software.