Calculating Memory Requirements

[This article was first published on Daniel's Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I had a conversation with people at the office about size and memory requirements that a computer must have to deal with a data frame. It started like this: suppose you have a data frame with 2,000,000 rows and 250 columns, all of which are numeric data (2,000,000 × 250 × 8 bytes/numeric). Roughly saying, how much memory is required to store this data?

We can calculate the proximate memory usage using number of rows and columns within the data:

# bytes
> 2000000*250*8 
[1] 4000000000
> # bytes/MB
> 2000000*250*8/2^{20}
[1] 3814.697
> # MB
> round(2000000*250*8/2^{20},2)
[1] 3814.7
> # GB
> round(2000000*250*8/2^{20}/1024, 2)
[1] 3.73

We also can apply the same approach on other types of data with attention to number of bytes used to store different data types, as character, factors etc. We also can use 10 rather than 8 as the bytes/numeric coefficient, and triple that to figure out how much contiguous space is needed. If the data frame is just character data, the ratio is going to be around 1.1, and reading it in as factor data might be even below this.

Finally, if you want to get a solid understanding of R’s memory management, you will find a section in the Hadley Wickham’s book very useful.

To leave a comment for the author, please follow the link and comment on their blog: Daniel's Blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)