Site icon R-bloggers

Shrinking R’s PDF output

[This article was first published on PlanetFlux, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
R is great for graphics, but I’ve found that the PDF’s R produces when drawing large plots can be extremely large. This is especially common when using spplot() to plot a large raster. I’ve made a 15 page PDF full of rasters that was hundreds of MB in size.  Obviously I don’t need all the detail (every pixel of the raster) represented in the pdf and would rather have it reduced in size somehow.  So I wrote an R function to automate the following:
  1. take an existing pdf and run ps2pdf on it as an intial compression step. Often this step is all that’s needed.
  2. split it into separate files using pdftk
  3. Check to see if each separate page is larger than some threshold you specify (I set 5MB as the default)
  4. If any one page is larger, rasterize the whole thing to a PNG file using ghostscript. I used the multicore package to parallelize this step, but this isn’t necessary and that call could be replaced by lapply() to run them sequentially.
  5. Put the separate pages (perhaps a mix of the original and the compressed rasters) back together.

Here’s the function:



 shrinkpdf<-function(pdf,maxsize=5,suffix="_small",verbose=T){  
  require(multicore)  
   wd=getwd()  
   td=paste(tempdir(),"/pdf",sep="")  
   if(!file.exists(td)) dir.create(td)  
   if(verbose) print("Performing initial compression")  
   system(paste("ps2pdf ",pdf," ",td,"/test.pdf",sep=""))  
   setwd(td)  
   system(paste("pdftk ",td,"/test.pdf burst",sep=""))  
   files=list.files(pattern="pg_")  
   sizes=sapply(files,function(x) file.info(x)$size)*0.000001 #get sizes of individual pages  
   toobig=sizes>=maxsize  
   if(verbose)  print(paste("Resizing ",sum(toobig)," pages:  (",paste(files[toobig],collapse=","),")",sep=""))  
   mclapply(files[toobig],function(i){  
    system(paste("gs -dBATCH -dTextAlphaBits=4 -dNOPAUSE -r300 -q -sDEVICE=png16m -sOutputFile=",i,".png ",i,sep=""))  
    system(paste("convert -quality 100 -density 300 ",i,".png ",strsplit(i,".",fixed=T)[[1]][1],".pdf ",sep=""))  
    if(verbose) print(paste("Finished page ",i))  
    return()  
   })  
   if(verbose) print("Compiling the final pdf")  
   file.remove("test.pdf")  
   file.remove(list.files(pattern="png"))  
   setwd(wd)  
   system(paste("gs -dBATCH -dNOPAUSE -q -sDEVICE=pdfwrite -sOutputFile=",strsplit(pdf,".",fixed=T)[[1]][1],suffix,".pdf ",td,"/*.pdf",sep=""))  
   file.remove(list.files(td,full=T))  
  if(verbose) print("Finished!!")  
 }  

To leave a comment for the author, please follow the link and comment on their blog: PlanetFlux.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.