From one Regression to Hundreds Within Seconds: A Shiny Specification Curve

[This article was first published on An Accounting and Data Science Nerd's Corner, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Online appendices detailing the robustness of empirical analyses are
paramount but they never let readers explore all reasonable researcher
degrees of freedom. Simonsohn, Simmons and Nelson suggest
a ‘specification curve’ that allows readers to eyeball how a main coefficient
of interest varies across a wide arrange of specifications. I build on this idea
by making it interactive: A shiny-based web app enables readers to explore the
robustness of findings in detail along the whole curve.

Following up on two blog articles
that introduced the in-development
‘rdfanalysis’ package, the app is new extension of this package.
In essence, it let’s you change the research design choices that you want
to display and then redraws the curve on the fly.

In its simple version, it just needs a data frame with each row containing
an estimate and its choices. In most cases, you also want to include lower and
upper bounds of the estimate as well so that the specification curve can
display a nice confidence interval ribbon. As an example, the first few rows
of data used in the example below look as follows:

# devtools::install_github("joachim-gassen/rdfanalysis")
library(rdfanalysis)
load(url("https://joachim-gassen.github.io/data/rdf_ests.RData"))
kable(head(ests), format.args = list(digits = 3))
na.omit idvs outlier_tment_style outlier_cutoff model_type feffect cluster est lb ub
yes gdp_only win 0 level-level none none 0.362 0.3466 0.3783
no gdp_only win 0 level-level none none 0.386 0.3737 0.3983
yes gdp_school win 0 level-level none none 0.131 0.1159 0.1452
no gdp_school win 0 level-level none none 0.075 0.0644 0.0855
yes gdp_ue win 0 level-level none none 0.370 0.3543 0.3860
no gdp_ue win 0 level-level none none 0.325 0.3116 0.3378

This is the type of data that exhaust_design() from the rdfanalysis package
will generate. If you create your own data, you need to inform
plot_rdf_spec_curve() which columns in your data frame contain choices. You
do this by setting attribute in the data frame. In the case above, choices are
included in columns 1 to 7. So that you would set the attribute as follows:

attr(ests, "choices") <- 1:7

Once you have such a data frame, you can plot your specification curve:

attr(ests, "choices") <- 1:7
plot_rdf_spec_curve(ests, "est", "lb", "ub") 

Nice. But how does one create the interactive display? Easy. Just call
shiny_rdf_spec_curve(), giving your data and the additional parameters that
you would hand over to plot_rdf_spec_curve() as a list:

shiny_rdf_spec_curve(ests, list("est", "lb", "ub")) 

You will see that the app will take a while to display the initial specification
curve. This is because it is based on 11,264 specifications. Once you start to
drill down, the app will become more responsive.

When you focus on only a few specifications you might think “Hey this is nice but
I would rather like to see the actual regression results for these cases”.
This can be done! You can use the workflow of the rdfanalysis package so that
the app will present the actual model results as soon as you zoomed in on a handful
of specifications. While you at it you can also specify your preferred
specification (e.g., the one that you presented in your paper).

design <- define_design(steps = c("read_data",
                                  "select_idvs",
                                  "treat_extreme_obs",
                                  "specify_model",
                                  "est_model"),
                        rel_dir = "vignettes/case_study_code")

shiny_rdf_spec_curve(
  ests, list("est", "lb", "ub"),
  design, "vignettes/case_study_code",
  "https://joachim-gassen.github.io/data/wb_new.csv",
  default_choices = list(na.omit = "no",
                         idvs = "full",
                         outlier_tment_style ="win",
                         model_type = "level-log",
                         outlier_cutoff = 0,
                         feffect = "ctryyear",
                         cluster = "ctryyear"))

Please note that the code above will only run when you have forked the
rdfanalysis repo and set the working directory to its root.

Finally, you can add a title and a short info text by setting the title and
abstract parameters and, voilà: Your interactive and exhaustive robustness
section
.

A Shiny Specification Curve

Kudos to Nate Breznau for bringing up the idea to use shiny to visualize the
specification curve. See my former post for more detail on the case and on how to drill deeper into the findings. Feel free to use the in-development ‘rdfanalysis’ package to exhaust the researcher degrees of freedoms in your own projects. If you have remarks about this project, I would love to hear from you. Use the comment section below or reach out via email or twitter.

Enjoy!

To leave a comment for the author, please follow the link and comment on their blog: An Accounting and Data Science Nerd's Corner.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)