Most of the statistics work I do now is reproducible research – this

can offer a big advantage for clients but of course that doesn’t

necessarily mean they realise it …

Below is a text we have been pasting in at the bottom of the source

documents (and which therefore appears in the pdf’s) to explain

reproducible research. Would be very interested if anyone has any

better ideas ….

This is a reproducible research document.

This approach has the following advantages: • making it easier for us

to return to the data and analyses in the future and repeat or extend

them • making it easier for the client to do the same without having

to contact us • enabling other researchers to repeat and verify these

findings themselves, even automatically if they desire. • Ensuring

complete transparency of the results.

Concretely, this means that the original SPSS and other data files

will not be changed at all. All recoding, data cleaning, omission of

cases etc is carried out in syntax. In fact this report document

itself – tables, graphics, statistics mentioned within the text are

produced entirely by the following procedure: A word processing

document (“source file”) is prepared which is essentially the final

report complete with introduction, chapter headings, commentary etc

together with blocks of syntax where statistical results are required

- in particular tables, and graphics and inline results. A single

syntax file is run which takes the source file and creates a second

document, the present report, which is identical to the source file

except that the blocks of syntax are replaced by the results of the

syntax (tables, graphics, etc.). So there is neither any

cutting-and-pasting or editing of data in the data files and nor is

there, for example, any manual editing of table data or graphics. So

at each point in this report at which data preparation is discussed,

the interested reader will find the corresponding syntax at the

corresponding point in the source file which actually conducts the

corresponding data preparation. And at each point in this report at

which tables, graphics etc are displayed, the interested reader will

find the syntax at the corresponding point in the source file which

actually constructs those tables and graphics. So the source document

and datasets can be made available to third parties who can then

repeat these calculations, see exactly how they are arrived and, and

can extend the analyses at will.

Unfortunately, to the best of our knowledge the statistics program

most familiar to social scientists, SPSS, does not fulfill all of

these requirements, in particular it cannot produce a complete report

automatically. So the work will be carried out using the package

Sweave for the open-source statistics program R1. But intermediate

datasets in SPSS format including all recoded and calculated variables

can be provided additionally, so that as much as possible of the above

can also be accomplished with SPSS.

In detail, the original word processing file is written using the free

programs libreoffice (www.documentfoundation.org) or Lyx (www.lyx.org)

which are available for Windows, Mac and Linux, which is transformed

into the present pdf report – the document you are looking at now –

using the R statistics engine, www.r-project.org, also available free

on all platforms.

