Reproducibility has come a long way in political science since I began my PhD all the way back in 2008. Many major journals now require replication materials be made available either on their websites or some service such as the Dataverse Network.
This is certainly progress. But what are political scientists actually supposed to do with this new information? It does help avoid effort duplication–researchers don’t need to gather data or program statistical techniques that have already been gathered or programmed. It promotes better research habits. It definitely provides ”procedural oversight”. We would be highly suspect of results from authors that were unable or unwilling to produce their code/data.
However, there are lots of problems that data/code availability requirements do not address. Apart from a few journals like Political Science Research and Methods, most journals have no standing policy to check the replication materials’ veracity. Reviewers rarely have access to manuscripts’ code/data. Even if they did have access to it, few reviewers would be willing or able to undertake the time consuming task of reviewing this material.
Do political science journals care about coding and data errors?
What do we do if someone replicating published research finds clear data or coding errors that have biased the published estimates?
Note that I’m limiting the discussion here to honest mistakes, not active attempts to deceive. We all make these mistakes. To keep it simple, I’m also only talking about clear, knowable, and non–causal coding and data errors.
Probably the most responsible action a journal could take to finding clear cut coding/data biased results would be to directly adjoin to the original article a note detailing the bias. This way readers will always be aware of the correction and will have the best information possible. This is a more efficient way of getting out corrected information than relying on some probabilistic process where readers may or may not stumble upon the information posted elsewhere.
As far as I know, however, no political science journal has a written procedure (please correct me if I’m wrong) for dealing with this new information. My sense is that there are a series of ad hoc responses that closely correspond to how the bias affects the results:
The situation where a journal is most likely to do anything is when correcting the bias makes the results no longer statistically significant. This might get a journal to append a note to the original article. But maybe not, they could just ignore it.
It might be that once the coding/data bias is corrected, the sign of an estimated effect flips–the result of what Andrew Gelman calls Type S errors. I really have no idea what a journal would do in this situation. They might append a note or maybe not.
Perhaps the most likely outcome of correcting honest coding/data bias is that the effect size changes. These errors would be the result of Gelman’s Type M errors. My sense (and experience) is that in a context where novelty is greatly privileged over facts journal editors will almost certainly ignore this new information. It will be buried.
Do political scientists care about effect size?
Due to the complexity of what political scientists study, we rarely (perhaps with the exception of election forecasting) think that we are very close to estimating a given effect’s real magnitude. Most researchers are aiming for statistical significance and a sign that matches their theory.
Does this mean that we don’t care about trying to estimate magnitudes as closely as possible?
Looking at political science practice pre-publication, there is a lot of evidence that we do care about Type M errors. Considerable effort is given to finding new estimation methods that produce less biased results. Questions of omitted variable bias are very common at research seminars and in journal reviews. Most researchers do carefully build their data sets and code to minimise coding/data bias. Sure many of these efforts are focused on the headline stuff–whether or not a given effect is significant and what the direction of the effect is. But, and perhaps I’m being naive here, these efforts are also part of a desire to make the most accurate estimate of an effect as possible.
However, the review process and journals’ responses to finding Type M errors caused by honest coding/data errors in published findings suggest that perhaps we don’t care about effect size. Reviewers almost never look at code and data. Journals (as far as I know, please correct me if I’m wrong) never append information on replications that find Type M errors to original papers.
I have a simple prescription for demonstrating that we actually care about estimating accurate effect sizes:
Develop a standard practice of including a short authored write up of the data/code bias with corrected results in the original article’s supplementary materials. Append a notice to the article pointing to this.
Doing this would not only give readers more accurate effect size estimates, but also make replication materials more useful.
Standardising the practice of publishing authored notes will incentivise people to use replication materials, find errors, and publicly correct them. Otherwise, researchers who use replication data and code will be replicating easy to correct errors.