R is great for graphics, but I’ve found that the PDF’s R produces when drawing large plots can be extremely large. This is especially common when using spplot() to plot a large raster. I’ve made a 15 page PDF full of rasters that was hundreds of MB in size. Obviously I don’t need all the detail (every pixel of the raster) represented in the pdf and would rather have it reduced in size somehow. So I wrote an R function to automate the following:
- take an existing pdf and run ps2pdf on it as an intial compression step. Often this step is all that’s needed.
- split it into separate files using pdftk
- Check to see if each separate page is larger than some threshold you specify (I set 5MB as the default)
- If any one page is larger, rasterize the whole thing to a PNG file using ghostscript. I used the multicore package to parallelize this step, but this isn’t necessary and that call could be replaced by lapply() to run them sequentially.
- Put the separate pages (perhaps a mix of the original and the compressed rasters) back together.
Here’s the function:
if(verbose) print("Performing initial compression")
system(paste("ps2pdf ",pdf," ",td,"/test.pdf",sep=""))
system(paste("pdftk ",td,"/test.pdf burst",sep=""))
sizes=sapply(files,function(x) file.info(x)$size)*0.000001 #get sizes of individual pages
if(verbose) print(paste("Resizing ",sum(toobig)," pages: (",paste(files[toobig],collapse=","),")",sep=""))
system(paste("gs -dBATCH -dTextAlphaBits=4 -dNOPAUSE -r300 -q -sDEVICE=png16m -sOutputFile=",i,".png ",i,sep=""))
system(paste("convert -quality 100 -density 300 ",i,".png ",strsplit(i,".",fixed=T)[],".pdf ",sep=""))
if(verbose) print(paste("Finished page ",i))
if(verbose) print("Compiling the final pdf")
system(paste("gs -dBATCH -dNOPAUSE -q -sDEVICE=pdfwrite -sOutputFile=",strsplit(pdf,".",fixed=T)[],suffix,".pdf ",td,"/*.pdf",sep=""))