# Managing memory in a list of lists data structure

April 3, 2013
First, a confession: instead of using classes and defining methods for them, I build a lot of ad hoc data structures out of lists and then build up one-off methods that operate on those lists of lists. I think this is a perl-ism that has transferred into my R code. I might eventually learn how to do classes, but this hack has been working well enough.

One issue I ran into today is that it was getting tedious to find out which objects stored in the list of lists was taking up the most memory. I ended up writing this rather silly recursive function that may be of use to you if you also have been scarred by perl.

# A hacked together function for exploring these structures
get.size <- function( obj.to.size, units='Kb') {
# Check if the object we were passed is a list
# N.B. Since is(list()) returns c('list', 'vector') we need a
#      multiple value comparison like all.equal
# N.B. Since all.equal will either return TRUE or a vector of
#      differences wrapping it in is.logical is the same as
#      checking if it returned TRUE.
if ( is.logical( all.equal( is(obj.to.size) , is(list())))) {
# Iterate over each element of the list
lapply( obj.to.size ,
function(xx){
# Calculate the size of the current element of the list
# N.B. object.size always returns bytes, but its print
#      allows different units. Using capture.output allows
#      us to do the conversion with the print method
the.size <- capture.output(print(object.size(xx), units=units))
# This object may itself be a list...
if( is.logical( all.equal( is(xx), is(list())))) {
# if so, recurse if we aren't already at zero size
if( the.size != paste(0, units) ) {
the.rest <- get.size( xx , units)
return( list(the.size, the.rest) )
}else {
# Or just return the zero size
return( the.size )
}
} else {
# the element isn't a list, just return its size
return( the.size)
}
})
} else {
# If the object wasn't a list, return an error.
stop("The object passed to this function was not a list.")
}
}


The output looks something like this

$models$models[[1]]
[1] "2487.7 Kb"

$models[[2]]$models[[2]]$naive.model [1] "871 Kb"$models[[2]]$clustered.model [1] "664.5 Kb"$models[[2]]$gls.model [1] "951.9 Kb"$V
[1] "4628.2 Kb"

$fixed.formula [1] "1.2 Kb"$random.formula
[1] "2.6 Kb"


where the first element of the list is the sum of everything below it in the hierarchy. Therefore, the whole “models” is 2487.7 Kb and “models\$naive.model” is only 871 Kb of that total.