I have recently been thinking about the relationship between text in a final report and data analysis. The broader concern is with making the conduct and reporting of statistical analyses more transparent. I am inspired by the ideas of literate programming, Sweave, and open access to data.
Something to aspire to:
- Raw data is shared (ethics, copyright, and other considerations permitting).
- Code is shared that shows how the data was imported, transformed, and analysed. This code is well written, commented, and documented.
- The report is shared as opposed to requiring a paid subscription.
- Report output including tables, figures, and some text is linked directly to the analyses in code.
While the aspirations transcend R, I like the prospect of having analyses in R integrated with a final report. The inclusion of tables and figures , at least conceptually is a straightforward idea. However, the inclusion of text in a results section is a little fuzzier. Surely, text in a results section (I’ll call it “results text” for short) varies in how it relates to actual analyses. Thus, I had the following questions: 1) What is the unit of results text? 2) How does results text vary and what should be automatically supplied by R?; 3) For results text that should not be supplied by R, how should it be integrated into an analysis process?
- A unit of results text is any continuous string of text. For example, “r=.23″ and “F(2, 23) = 7.89″ are both continuous strings of text. Such a unit includes multiple elements of information, but it could be imported from R as a continuous string of information and only one additional bit of information would be required to define the text’s location in the report.
- Results text can be classified as either numeric or qualitative. Numeric results text includes correlations, means, percentages, significance values, effect size measures, and so on, and any standardised reporting text that surrounds its presentation (e.g, “r = ” in “r = .23″ or the F, brackets, equals signs in an F test). Qualitative results text includes a wide range of content: a) description of analysis steps; b) justification of analyses; c) general comments about the pattern of results; d) non-numeric statements relating to statistical significance, direction of effect, effect size; e) statements about the relationship between results and expectations possibly with some explanation.
- Results text varies in the degree to which it is contingent on the actual results of data analyses. At one end there is text that is not influenced (e.g., text introducing a table or figure; text justifying an analysis strategy; text setting out the steps taken to produce the results). Numeric results text is at the other end of the continuum and is altered by the slightest of changes to the data or analytic approach (e.g., the sample size or exact correlation will change after a case is deleted). There is also a wide variety of contingent qualitative results text (e.g., comments on the general pattern of results; comments about the size of a relationship).
- Numeric results text should be integrated automatically into the final report.
- Qualitative results text should be distinguished based on whether it is contingent on the results or not.
- Noncontingent qualitative results text should be written up first.
- Contingent qualitative results text should be written up after examining the contingent analysis output.
- Contingent qualitative results text should be flagged in the word processor.
- Contingent qualitative results text is based on underlying data and output. Whenever this data and output is changed, the text should be audited to see whether it needs to be changed as a consequence.
- Placeholders for contingent results text (numeric or qualitative) can be placed in the document in preparation for completion of analyses.