Recently I had the opportunity to do a job swap with one of the guys in the laboratory here at HSL. I helped out with the mass-spectrometry and James helped me with the data analysis. Two very useful things came out of this.
Firstly, it’s been very informative to see how the data I get is created. I tend to assume that the numbers that are given to me are either correct or mistakes. The reality though is more subtle. One thing at surprised me was the length that the chemists have to go to to make sure that their instruments give sensible answers. As well as testing urine samples, you need to test blank samples (to clean out the spectrometer’s tubes), standard samples (to calibrate the machine) and quality control samples (to check that the calibration is correct). Even then, it wasn’t entirely clear that you would get the same answer if you ran the samples twice.
The project was based around testing Thallium levels in the general population. To give an idea of how much we could trust the data, I re-analysed 50 of the samples that James had run. The tricky bit was the pipetting; there’s a surprising art to avoiding air bubbles.
As you can see, my results were consistently lower than James’s. Taking James as the gold standard in mass-spectrometry skill and myself as the worst-case scenario, you can see that we should only trust the results to the nearest order of magnitude. This is not a trivial exercise – it demonstrates what would happen if James is replaced by an idiot. (All too possible, depending on what George Osbourne says later today.)
The second really good thing to come out of this was that I managed to drill into James the importance of manipulating data with code instead of manually editing spreadsheets. He in turn passed on this message when we presented our findings to the lab. (Main finding: no-one is about to die of thalium poisoning.) After the presentation, one of our toxicologists came up to me and said
“I finally get it. I understand why mathematicians keep saying that you shouldn’t use Excel. It’s because in order to for your work to be reproducible and auditable, you need the trail of code to see what you’ve done.”