Fast-track publishing using knitr: exporting images for sharing and press (part III)

January 7, 2014
By

(This article was first published on G-Forge » R, and kindly contributed to R-bloggers)

Fast-track publishing using knitr is a short series on how I use knitr to speedup publishing in my research. This is the third article in the series devoted to plots. Hopefully you will through this post have the need-to-know stuff so that you can (1) add auto-numbering to your figures, (2) decide on image formats, (3) choose image resolution, and (4) get anti-aliasing working.

Auto-numbering of figures

In knitr you use the chunks header to declare figure size, type, caption and more. Unfortunately the fig.cap does not work by default in markdown. There is a simple remedy for this by using knitr’s “hooks”:

?View Code RSPLUS
 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78  library(knitr)   # Notify that you want to use the counter, # if you set the counter to 3 then it will use # that as starting number. You can also use strings # if you for instance have a split figure with # a "1a" and "1b" setup options(figure_counter = TRUE)   # If you want roman letters then set: # options(figure_counter_roman = TRUE)   # Evaluate the figure caption after the chunk, # sometimes you want to calculate stuff inside the # chunk that you want to include in the caption and # it is therefore useful to evaluate it afterwards. opts_knit$set(eval.after='fig.cap') # The actual hook knit_hooks$set(plot = function(x, options) { fig_fn = paste0(opts_knit$get("base.url"), paste(x, collapse = ".")) # Some stuff from the default definition fig.cap <- knitr:::.img.cap(options) # Style and additional options that should be included in the img tag style=c("display: block", sprintf("margin: %s;", switch(options$fig.align, left = 'auto auto auto 0', center = 'auto', right = 'auto 0 auto auto'))) # Certain arguments may not belong in style, # for instance the width and height are usually # outside if the do not have a unit specified addon_args = ""   # This is perhaps a little overly complicated prepared # with the loop but it allows for a more out.parameters if necessary if (any(grepl("^out.(height|width)", names(options)))){ on <- names(options)[grep("^out.(height|width)", names(options))] for(out_name in on){ dimName <- substr(out_name, 5, nchar(out_name)) if (grepl("[0-9]+(em|px|%|pt|pc|in|cm|mm)", out_name)) style=append(style, paste0(dimName, ": ", options[[out_name]])) else if (length(options$out.width) > 0) addon_args = paste0(addon_args, dimName, "='", options[[out_name]], "'") } } # Add counter if wanted fig_number_txt <- "" cntr <- getOption("figure_counter", FALSE) if (cntr != FALSE){ if (is.logical(cntr)) cntr <- 1 # The figure_counter_str allows for custom # figure text, you may for instance want it in # bold: Figure %s: # The %s is so that you have the option of setting the # counter manually to 1a, 1b, etc if needed fig_number_txt <- sprintf(getOption("figure_counter_str", "Figure %s: "), ifelse(getOption("figure_counter_roman", FALSE), as.character(as.roman(cntr)), as.character(cntr))) if (is.numeric(cntr)) options(figure_counter = cntr + 1) } # Put it all together paste0("") }) That’s it, put this in your first knitr-chunk and all your images with a caption will have a figure counter. If you want to reference the number you can always call getOption("figure_counter") and you can insert the next images number into your text. If you want to use roman numbers just set options(figure_counter_roman=TRUE). Image formats When preparing your manuscript you will need images for two different purposes; small and portable for sharing, and images suited for press. Knitr allows you to quickly convert from one to the other by adjusting the fig.dev and dpi settings. As a general rule of thumb you want PNG for including images in your Word document and EPS for press. Below I’ll try to go into these formats and more. Basics There are two major image formats that you need to be aware of: • Vector formats: A vector image is a set of connections between points. These connections can generate lines or fills (polygons, shapes etc.), and are therefore well suited for plots. The major advantage with vector graphics is that you can scale it losslessly to any desired size. Common vector file formats: SVG (Scalable Vector Graphics), PDF (Portable Document Format), PS (PostScript), and EPS (Encapsulated PostScript) files. Out of these the EPS is most commonly supported by journals, I’ve had unfortunately trouble sharing (my favorite) SVG-files. • Raster formats: This is the dominating image type, useful for photos and similar applications but less suited for plots. Here you have a grid where each cell is a pixel with a set color and the size of the grid is the resolution. The major downside with raster images is that if you make them larger the squared pixel shape will become visible, i.e. you will have rough edges like in the old video games. This group can further be divided into lossy formats, such as JPEG, and lossless formats such as PNG. This simply indicates if the image compression looses information or retains every detail, it is not the same as the lossless resizing of vector formats. Common raster file formats: PNG (Portable Network Graphics), JPEG/JPG (Joint Photographic Expert Group), and TIFF (Tagged Image File Format). Sharing images Although SVG is my favorite format you can’t insert these into your Word document (at least my 2010 version). I therefore rely on the PNG format in 96 DPI for sharing. Telling knitr to use this for all your images in the document is really easy, just add this code before any plots (add it only once in your document): ?View Code RSPLUS  1 2 3 4  library(knitr) opts_chunk$set(dev="png", dev.args=list(type="cairo"), dpi=96)

If you browse you figure-folder (located in the same folder as your Rmd-document) you will find all the PNG images after knitting. An important detail is that you want to disable including these images in the HTML-document that knitr generates as Libre Office/Word can’t handle these, see my previous post on setting up an .RProfile.

Note: you can also set the fig.dev option for each chunk but since you usually want all images to be the same type then I prefer to use the opts_chunk option. For this to work smoothly even when I don’t knit the document I make sure to load the knitr-package to avoid any

Error: object 'opts_chunk' not found

.

Press

Vector graphics are excellent for publication and my preferred way of exporting for publication. Unfortunately few journals accept SVG and you are often stuck with the EPS format that is somewhat limited. Setting up EPS formatting is really easy, just change previous into:

?View Code RSPLUS
 1 2  library(knitr) opts_chunk$set(dev='postscript') If you browse you figure-folder you should now see all the EPS images after knitting. The main problem I’ve had with EPS is that the format does not handle transparencies. For instance, you may have generated a beautiful X and using the PNG-format you get this: But when you open your EPS image in Inkscape the transparent polygon has suddenly been removed: If you remove the image transparency you can get a nice image but with less finesse: If you have transparencies and want to retain these, I recommend that you try the TIFF format when submitting to journals. They usually support it although make sure you compress the images using the compression="lzw" argument or your images may become huge, they can actually surpass the journal’s maximum image size. ?View Code RSPLUS  1 2 3 4  library(knitr) opts_chunk$set(dev="tiff", dev.args=list(compression="lzw"), dpi=300)
?View Code RSPLUS
 1 2 3 4 5 6 7 8 9 10 11 12 13  # The code for the x-mark library(ggplot2) polygon_df1 <- data.frame(x=c(0,0.75,1,.25), y=c(0,1,1,0)) polygon_df2 <- data.frame(x=c(0,0.75,1,.25), y=c(1,0,0,1)) ggplot(polygon_df1, aes(x=x, y=y)) + geom_polygon(fill="steelblue", col="steelblue") + geom_polygon(data=polygon_df2, fill="#55558899", col="#55558899") + scale_x_continuous(expand = c(0,0)) + scale_y_continuous(expand = c(0,0)) + xlab("") + ylab("") + theme(line = element_blank(), text = element_blank(), line = element_blank(), title = element_blank())

Resolution (DPI)

For screen output use 96 or 120 DPI while for print you either use 300 or 600 DPI. DPI stands for Dots Per Inch and apply only to rasterized images. R combines the image width with the DPI and produces a corresponding graphic. While you may have specified a certain width the resulting image will have a certain number of pixels giving it its size, a low DPI will appear small since there are few pixels while a high DPI will result in a large image.

DPI come with a long history and it is important to remember that there is a difference between print and screen. Originally Macintosh (Apple) used 72 DPI, this was later on increased on Microsoft computers to 96.

I use the 96 DPI for screen resolution as it gives in my opinion images of roughly the size that I want. Paper/print on the other hand is always high-resolution and anything below 300 will appear as poor quality.

Anti-aliasing

Anti-aliasing is probably the simplest change you can add to your plots for a professional look. While all vector-images are automatically anti-aliased you need to add this to rasterized images using the option type="cairo". I have previously dedicated a whole post on how to deal with the Cairo and cairoDevice packages just to find out that these are obsolete in more recent R-versions. To get this into knitr all you need to add is a dev.args list that contains the type="cairo":

?View Code RSPLUS
 1 2 3  opts_chunk\$set(dev="png", dev.args=list(type="cairo"), dpi=96)

Note that the antialias-argument seems to do nothing for the actual graphics, you can compare the three alternatives below:

 dev.args = list(type=”windows”) dev.args = list(type=”windows”,     antialias=”cleartype”) dev.args = list(type=”cairo”)

It is a subtle difference but without it the plot looks unrefined, especially if you have a poor screen. Another thing that is good to know is that fills are not anti-aliased. You therefore need to add a thin line to your fills in the same color to get the desired anti-aliasing. Plain and lattice-plots both have the thin line by default while for ggplot2 you need to explicitly declare that you want the line, see how I use the col= and fill= arguments to generate the plots above.

?View Code RSPLUS
 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15  library(ggplot2) line <- data.frame(x=c(0.25,1), y=c(1,.45)) polygon <- data.frame(x=c(0,0.75,1,0), y=c(.75,.20,.20,1)) aa <- ggplot(line, aes(x=x, y=y)) + geom_line(fill="steelblue", col="steelblue", lwd=2) + geom_polygon(data=polygon, fill="#555588", col="#555588") + scale_x_continuous(expand = c(0,0)) + scale_y_continuous(expand = c(0,0)) + theme_bw() + xlab("") + ylab("") + theme(line = element_blank(), text = element_blank())   aa + annotate("text", label="Not\nanti-\naliased", size=6, y=.93, x=.7)

Or compare these two plots: