Managing memory in a list of lists data structure
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
First, a confession: instead of using classes and defining methods for them, I build a lot of ad hoc data structures out of lists and then build up one-off methods that operate on those lists of lists. I think this is a perl-ism that has transferred into my R code. I might eventually learn how to do classes, but this hack has been working well enough.
One issue I ran into today is that it was getting tedious to find out which objects stored in the list of lists was taking up the most memory. I ended up writing this rather silly recursive function that may be of use to you if you also have been scarred by perl.
# A hacked together function for exploring these structures
get.size <- function( obj.to.size, units='Kb') {
# Check if the object we were passed is a list
# N.B. Since is(list()) returns c('list', 'vector') we need a
# multiple value comparison like all.equal
# N.B. Since all.equal will either return TRUE or a vector of
# differences wrapping it in is.logical is the same as
# checking if it returned TRUE.
if ( is.logical( all.equal( is(obj.to.size) , is(list())))) {
# Iterate over each element of the list
lapply( obj.to.size ,
function(xx){
# Calculate the size of the current element of the list
# N.B. object.size always returns bytes, but its print
# allows different units. Using capture.output allows
# us to do the conversion with the print method
the.size <- capture.output(print(object.size(xx), units=units))
# This object may itself be a list...
if( is.logical( all.equal( is(xx), is(list())))) {
# if so, recurse if we aren't already at zero size
if( the.size != paste(0, units) ) {
the.rest <- get.size( xx , units)
return( list(the.size, the.rest) )
}else {
# Or just return the zero size
return( the.size )
}
} else {
# the element isn't a list, just return its size
return( the.size)
}
})
} else {
# If the object wasn't a list, return an error.
stop("The object passed to this function was not a list.")
}
}
The output looks something like this
$models $models[[1]] [1] "2487.7 Kb" $models[[2]] $models[[2]]$naive.model [1] "871 Kb" $models[[2]]$clustered.model [1] "664.5 Kb" $models[[2]]$gls.model [1] "951.9 Kb" $V [1] "4628.2 Kb" $fixed.formula [1] "1.2 Kb" $random.formula [1] "2.6 Kb"
where the first element of the list is the sum of everything below it in the hierarchy. Therefore, the whole “models” is 2487.7 Kb and “models$naive.model” is only 871 Kb of that total.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.