Shrinking R’s PDF output

June 17, 2010
By

(This article was first published on PlanetFlux, and kindly contributed to R-bloggers)

R is great for graphics, but I've found that the PDF's R produces when drawing large plots can be extremely large. This is especially common when using spplot() to plot a large raster. I've made a 15 page PDF full of rasters that was hundreds of MB in size.  Obviously I don't need all the detail (every pixel of the raster) represented in the pdf and would rather have it reduced in size somehow.  So I wrote an R function to automate the following:
  1. take an existing pdf and run ps2pdf on it as an intial compression step. Often this step is all that's needed.
  2. split it into separate files using pdftk
  3. Check to see if each separate page is larger than some threshold you specify (I set 5MB as the default)
  4. If any one page is larger, rasterize the whole thing to a PNG file using ghostscript. I used the multicore package to parallelize this step, but this isn't necessary and that call could be replaced by lapply() to run them sequentially.
  5. Put the separate pages (perhaps a mix of the original and the compressed rasters) back together.

Here's the function:



 shrinkpdf<-function(pdf,maxsize=5,suffix="_small",verbose=T){  
require(multicore)
wd=getwd()
td=paste(tempdir(),"/pdf",sep="")
if(!file.exists(td)) dir.create(td)
if(verbose) print("Performing initial compression")
system(paste("ps2pdf ",pdf," ",td,"/test.pdf",sep=""))
setwd(td)
system(paste("pdftk ",td,"/test.pdf burst",sep=""))
files=list.files(pattern="pg_")
sizes=sapply(files,function(x) file.info(x)$size)*0.000001 #get sizes of individual pages
toobig=sizes>=maxsize
if(verbose) print(paste("Resizing ",sum(toobig)," pages: (",paste(files[toobig],collapse=","),")",sep=""))
mclapply(files[toobig],function(i){
system(paste("gs -dBATCH -dTextAlphaBits=4 -dNOPAUSE -r300 -q -sDEVICE=png16m -sOutputFile=",i,".png ",i,sep=""))
system(paste("convert -quality 100 -density 300 ",i,".png ",strsplit(i,".",fixed=T)[[1]][1],".pdf ",sep=""))
if(verbose) print(paste("Finished page ",i))
return()
})
if(verbose) print("Compiling the final pdf")
file.remove("test.pdf")
file.remove(list.files(pattern="png"))
setwd(wd)
system(paste("gs -dBATCH -dNOPAUSE -q -sDEVICE=pdfwrite -sOutputFile=",strsplit(pdf,".",fixed=T)[[1]][1],suffix,".pdf ",td,"/*.pdf",sep=""))
file.remove(list.files(td,full=T))
if(verbose) print("Finished!!")
}

To leave a comment for the author, please follow the link and comment on his blog: PlanetFlux.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Tags: , , , , ,

Comments are closed.