# Test Driven Analysis?

At the last LondonR meeting Francine Bennett from Mastodon C shared some of her experience and findings from an analysis of a large prescriptions data set of the UK's national health service (NHS). However, it was her last slide, which I found the most thought provoking. It asked for the definition of the following term:

Test-driven analysis?Francine explained that test driven development (TDD) is a concept often used in software development for quality assurance and she wondered if a similar approach could be also used for data analysis. Unfortunately the audience couldn’t provide her with

*the*answer, but many expressed that they face similar challenges. So do I.

Indeed, how do I go about test driven analysis? How do I know that I haven’t made a mistake, when I start an analysis of a new data set? Well, I don’t. But I try to mitigate risks. Similar to TDD, I consider which outputs I should expect from my analysis. Those outputs form the test scenarios of my analysis. Basically I try to write down everything I know, before I start working with the data, e.g.

- any other data sets or reports I can use for cross referencing,
- any back-of-the-envelope analysis I can carry out to provide ballpark answers,
- any relativities and ratios which should hold true,
- any known boundaries and thresholds,
- test scenarios for my code with small well known data, for which I know the outcome,
- names of experts, who could sense check and peer review my output.

*Far better an approximate answer to the right question, which is often vague, than an exact answer to the wrong question, which can always be made precise.*

