Site icon R-bloggers

Tips and Tools you may need for working on BIG data

[This article was first published on One Tip Per Day, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Nowadays everyone is talking about big data. As a genomic scientist, I could feel hungry of a collection of tools more specialized for the mediate-to-big data we deal everyday.

Here are some tips I found useful when getting, processing or visualizing large data set:

1. How to download data faster than wget?

We can use wget to download the data to local disk. If it’s large, we can download with other faster alternative, such as axel, aria2.

http://www.cyberciti.biz/tips/download-accelerator-for-linux-command-line-tools.html

2. Process the data in parallel with hidden option in GNU commands


3. In R, there are several must-have tips for large data, e.g. data.table
4. How to open scatter plot with too many points in Illustrator?

This is really a problem for me as we usually have a figure with >30k dots (i.e. each dot is a gene). Even though they are highly overlapping each other, opening it in Illustrator is extremely slow. Here is a tip: http://tex.stackexchange.com/questions/39974/problem-with-a-very-heavy-eps-image-scatter-plot-too-heavy-as-eps
From that, probably a better idea is to “compress” the data before plotting it, such as merge the overlapped ones if they overlapped some %.
or this one:
http://stackoverflow.com/questions/18852395/writing-png-plots-into-a-pdf-file-in-r
or this one:
http://stackoverflow.com/questions/7714677/r-scatterplot-with-too-many-points

Still working on the post…

To leave a comment for the author, please follow the link and comment on their blog: One Tip Per Day.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.