Managing memory in a list of lists data structure

April 3, 2013
By

(This article was first published on Nathan VanHoudnos » rstats, and kindly contributed to R-bloggers)

First, a confession: instead of using classes and defining methods for them, I build a lot of ad hoc data structures out of lists and then build up one-off methods that operate on those lists of lists. I think this is a perl-ism that has transferred into my R code. I might eventually learn how to do classes, but this hack has been working well enough.

One issue I ran into today is that it was getting tedious to find out which objects stored in the list of lists was taking up the most memory. I ended up writing this rather silly recursive function that may be of use to you if you also have been scarred by perl.

# A hacked together function for exploring these structures
get.size <- function( obj.to.size, units='Kb') {
  # Check if the object we were passed is a list
  # N.B. Since is(list()) returns c('list', 'vector') we need a
  #      multiple value comparison like all.equal
  # N.B. Since all.equal will either return TRUE or a vector of 
  #      differences wrapping it in is.logical is the same as 
  #      checking if it returned TRUE. 
  if ( is.logical( all.equal( is(obj.to.size) , is(list())))) {
    # Iterate over each element of the list
    lapply( obj.to.size ,
      function(xx){
        # Calculate the size of the current element of the list
        # N.B. object.size always returns bytes, but its print 
        #      allows different units. Using capture.output allows
        #      us to do the conversion with the print method
        the.size <- capture.output(print(object.size(xx), units=units))
        # This object may itself be a list...
        if( is.logical( all.equal( is(xx), is(list())))) {
           # if so, recurse if we aren't already at zero size 
           if( the.size != paste(0, units) ) {
             the.rest <- get.size( xx , units)
             return( list(the.size, the.rest) )
           }else {
             # Or just return the zero size
             return( the.size )             
           }
        } else {
           # the element isn't a list, just return its size
           return( the.size)
        }
      })
  } else {
    # If the object wasn't a list, return an error.
    stop("The object passed to this function was not a list.")
  }
}

The output looks something like this

$models
$models[[1]]
[1] "2487.7 Kb"

$models[[2]]
$models[[2]]$naive.model
[1] "871 Kb"

$models[[2]]$clustered.model
[1] "664.5 Kb"

$models[[2]]$gls.model
[1] "951.9 Kb"



$V
[1] "4628.2 Kb"

$fixed.formula
[1] "1.2 Kb"

$random.formula
[1] "2.6 Kb"

where the first element of the list is the sum of everything below it in the hierarchy. Therefore, the whole “models” is 2487.7 Kb and “models$naive.model” is only 871 Kb of that total.

To leave a comment for the author, please follow the link and comment on his blog: Nathan VanHoudnos » rstats.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.