It’s an old favourite of this blog, isn’t it. We had Gene name errors and Excel: lessons not learned (2012). Followed by Data corruption using Excel: 12+ years and counting (2016). Perhaps most depressingly of all, the conclusion of the trilogy, When your tools are broken, just change the data (2019-20).
Well, I’m happy (?) to see the publication of the latest instalment, inspired in part by the title of my first post: Gene name errors: Lessons not learned, from Mark Ziemann’s group. Here’s the accompanying Twitter thread. Summary: it’s even worse than we thought.
Tagging this one with the R tag, because the group are publishing monthly RMarkdown reports here. Congratulations Nature Communications!
As a footnote: you don’t escape this kind of thing when you leave bioinformatics. I listened to a colleague in a data science meeting yesterday declare that “we won’t be putting anything into production that relies on data supplied to us as spreadsheets”.