a) split the file into several pieces (free, straightforward but hard to maintain);
b) use MS SQL/MySQL (have to learn it, MS SQL isn’t free, not straightforward).
A useful summary of suggested solution:
1, 1) import the large file via “scan” in R;
2) convert to a data.frame –> to keep data formats
3) use cast –> to group data in the most “square” format as possible, this step involves the Reshape package, a very good one.
2, use the bigmemory package to load the data, so in my case, using read.big.matrix() instead of read.table(). There are several other interesting functions in this package, such as mwhich() replacing which() for memory consideration, foreach() instead of for(), etc. How large can this package handle? I don’t know, the authors successfully load a CSV with size as large as 11GB.
3, switch to a 64 bit version of R with enough memory and preferably on linux. I can’t test this solution at my office due to administration constraint, although it is doable, as mentioned in R help document,
Search & trial.