A Few Tips for Writing an R Book

June 3, 2013
By

(This article was first published on Yihui Xie, and kindly contributed to R-bloggers)

I just finished fixing (hopefully all) the problems in the knitr book returned from the copy editor. David Smith has kindly announced this book before I do. I do not have much to say about this book: almost everything in the book can be found in the online documentation, questions & answers and the source code. The point of buying this book is perhaps you do not have time to read through all the two thousand questions and answers online, and I did that for you.

This is my first book, and obviously there have been a lot for me to learn about writing a book. In retrospect, I want to share a few tips that I found useful (in particular, for those who plan to write for Chapman & Hall):

1. although it sounds like shameless self-promotion, using knitr made it a lot easier to manage R code and its output for the book; for example, I could quickly adapt to R 3.0.1 from 2.15.3 after I came back from a vacation; if I were to write a second edition, I do not think I will have big trouble with my R code in the book (it is easy to make sure the output is up-to-date);
2. I put my source documents under version control, which helped me watch the changes in the output closely; for example, I noticed the source code of the function fivenum() in base R was changed from R 2.15.3 to 3.0.0 thanks to GIT (R core have been updating base R everywhere!);
3. (opinionated) some people might be very bored to hear this: use LyX instead of plain LaTeX... because you are writing, not coding; LaTeX code is not fun to read...
4. for the LaTeX document class krantz.cls (by Chapman & Hall):
• to solve the only stupid problem in LaTeX (i.e., floating environments float to silly places by default), use something like this
  \renewcommand{\textfraction}{0.05}
\renewcommand{\topfraction}{0.8}
\renewcommand{\bottomfraction}{0.8}
\renewcommand{\floatpagefraction}{0.75}


I'm aware of the float package and the H option, and options like !tbp; I just do not want to force LaTeX to do anything. It may or may not be happy at some point.

• put \usepackage{emptypage} in the preamble to make empty pages really empty, as required by the copy editor.
• the document class krantz.cls does not work with the hyperref package, meaning that you cannot create bookmarks in the PDF; I have posted the solution here.
5. for authors whose native language is not English like me, here is a summary of my problems in English:
• when you want to use which, use that instead, unless there is a comma ahead, or you really want to emphasize a very specific object; e.g.,

"here is a package that is helpful" (correct)

"here is a package which is helpful" (wrong)

"we will introduce an extremely important technology next, which has revolutionized the life of poor statisticians"

• it is "A, B, and C" instead of "A, B and C"

• do not forget the comma in other places, either: "e.g.,", "i.e.,", "foo and bar, respectively"; actually, try to use the comma whenever possible to break long sentences into shorter pieces
6. for the plots, use the cairo_pdf() device when possible; in knitr, this means the chunk option dev = 'cairo_pdf'; the reason for the choice of cairo_pdf() over the normal pdf() device is that it can embed fonts in the PDF plot files, otherwise the copy editor will require you to embed all the fonts in the final PDF file of the book; normally pdflatex will embed fonts, and if there are fonts that are not embedded, it is very likely that they are from R graphics;
7. include as many figures as possible (I have 51 figures in this 200-page book), because this will make the number of pages grow faster (I'm evil) so that you will not feel frustrated, and the readers will not fall into the hell of endless text, just pages after pages;
8. prepare an extra monitor for copyediting;
9. learn a little bit about pdftk, because you may want to use it finally, e.g., replace one page with a blank page in the frontmatter;

One thing I did not really understand was the punctuation marks like commas and periods should go inside quotation marks, e.g.,

I have "foo" and "bar."

This makes me feel weird. I'm more comfortable with

I have "foo" and "bar".

There was also one thing that I did not catch by version control -- one figure file went wrong and I did not realize it, because normally I do not put binary files under version control. Fortunately, I caught it by my eyes. Karl Broman mentioned the same problem to me a while ago. I know there are tools for comparing images (ImageMagick, for example), and I was just too lazy to learn them.

I will be glad to know the experience of other authors, and will try to update this post according to the comments.