Automatic pdf generation and versioning with odfWeave

[This article was first published on Social data blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

odfWeave is a great tool for your reproducible research workflow, using R to produce reproducible OpenOffice reports. But a few pieces are lacking. – It is a bit of a drag, every time you want to make a pdf, to have to find and open your output document and click on the pdf button. – You have to manually change document names to keep track of versions and so that you know which source file produced which odt and pdf output files. – If you are collaborating with others you are probably used to having tracked changes switched on, but odfWeave will not play nicely with .odt documents which have tracked changes.

The following workflow addresses all these issues. I don’t suppose anyone would want to adopt exactly this workflow, but here’s what I do in the hope it might give others some bright ideas.

First you will need to enable printing pdfs from odt documents on the command line. I followed the tips on… to add a virtual pdf printer on my Ubuntu system, as follows. I guess there is something similar for Windows.

1) Install cups-pdf from the repositories (sudo apt-get install cups-pdf)

2) Browse to http://localhost:631/admin/ and set up your new virtual printer: click on add printer, choose “cups-pdf”, continue, choose “generic”, and install. Now you can run commands like “ooffice -pt Virtual_PDF_Printer somefile.odt” and a pdf will appear in ~/PDF/. And we can use system(“ooffice -pt Virtual_PDF_Printer somefile.odt”) from inside R to do the same. See where this is going?

3) Give your source odf file one name that never changes, I use “in.odt”; the function below will make sure you don’t need to fiddle about changing the filenames in the odfWeave command every time you have an new version

4) Run odfWeave(“in.odt”,“out.odt”) as usual

5) Define a function something like this:

ourDate=format(Sys.time(), "%Y_%b_%d_%H:%M")
system("mv out.odt Backups/out.odt")
system("ooffice -pt Virtual_PDF_Printer Backups/out.odt")
system(paste("mv ~/PDF/out.pdf     Backups/",ourDate,"_out.pdf",sep=""))
file.copy("in.odt", paste("Backups/",ourDate,"_in.odt",sep=""))
system(paste("mv Backups/out.odt Backups/",ourDate,"_out.odt",sep=""))

That might look a bit different on Windows or Mac, don’t know. Sorry if the code is not very elegant but it seems to work. The while statement is there to make the function wait until the pdf has been produced.



This will add to your Backups subdirectory:

1) a copy of the source document

2) a copy of the output document

3) a pdf of the output document;

with their names all preceded by the same date and time, so you know what belongs to what. You might have to create this directory first. And if you want to send a tracked-changes version to a collaborator, just open the latest odt output document and run “compare documents” with any previous version.

Permalink | Leave a comment  »

To leave a comment for the author, please follow the link and comment on their blog: Social data blog. offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)