Reproducible Econometric Research

August 25, 2011

(This article was first published on Econometrics Beat: Dave Giles' Blog, and kindly contributed to R-bloggers)

I doubt if anyone would deny the importance of being able to reproduce one's econometric results. More importantly, other researchers should be able to reproduce our results to verify (a) that we've done what we said we did; (b) to investigate the sensitivity of our results to the various choices we made (e.g., functional form of our model, choice of sample period, etc.); and (c) to satisfy themselves that they understand our analysis.

However, if you've ever tried to literally reproduce someone else's econometric results, you'll know that it's not always that easy to so - even if they supply you with their data-set. You really need to have their code (R, EViews, STATA, Gauss) as well. That's why I include both Data and Code pages with this blog.

Students of econometrics really shouldn't under-estimate the importance of the replicability of results. As tedious as it can be, it's really important to fully document the steps you take when "cleaning" your data prior to your empirical modelling. It's also sensible to document "bad" results as well as "good" results, if only for your own benefit when you inevitably have to re-visit your work at a later date. (Generally a much later data if you've submitted your work to a typical economics journal!)

There's been a move on the part of some academic journals towards asking or requiring authors of empirical papers to supply their data and code as a condition of acceptance of their work for publication. This material is then housed in an on-line repository that anyone can access.

Of course, exceptions sometimes have to be made - for example, when the data are proprietary and can't be released publicly. But these should be exceptions.

For example, the Journal of Applied Econometrics made this mandatory several years ago, and now have a very valuable Data Archive. I know that at least one of my colleagues here at UVic makes good use of this archive in her graduate teaching. That same journal also has a section for replication studies - we need more of this.

At the Journal of International Trade & Economic Development we introduced a data and code repository earlier this year. It's managed by Judith Clarke, and at this point it operates on a voluntary basis. I think we're going to have to make it mandatory, though - so far, no authors have volunteered to upload their files! I guess that incentives have something to do with this.

As far as I'm concerned, it's also perfectly reasonable to ask to see data and code when you're refereeing a paper for a journal. It should go without saying that such requests need to be made via the handling editor/associate editor. I ask for data and code quite frequently in connection with refereeing tasks, and it can lead to some interesting outcomes - believe me!

A while back, Jeff Racine drew my attention to Sweave, and kindly demonstrated some of its capabilities. So what's Sweave? I can't do better than to quote from the associated website:

"Sweave is a tool that allows to embed the R code for complete data analyses in latex documents. The purpose is to create dynamic reports, which can be updated automatically if data or analysis change. Instead of inserting a prefabricated graph or table into the report, the master document contains the R code necessary to obtain it. When run through R, all data analysis output (tables, graphs, etc.) is created on the fly and inserted into a final latex document. The report can be automatically updated if data or analysis change, which allows for truly reproducible research."

Notice that part of the appeal of Sweave is that it lends itself to the replicability of research results.

Incidentally, Jeff has a piece scheduled to appear in the  Journal of Applied Econometrics about RStudio (see my earlier post here), R, and Sweave. Watch out for it.

© 2011, David E. Giles

To leave a comment for the author, please follow the link and comment on his blog: Econometrics Beat: Dave Giles' Blog. offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.