Software Dependencies and Risk

[This article was first published on R – Win-Vector Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Dirk Eddelbuettel just shared an important point on software and analyses: dependencies are hard to manage risks.

If your software or research depends on many complex and changing packages, you have no way to establish your work is correct. This is because to establish the correctness of your work, you would need to also establish the correctness of all of the dependencies. This is worse than having non-reproducible research, as your work may have in fact been wrong even the first time.

Low dependencies and low complexity dependencies can also be wrong, but in this case there at least exists the possibility of checking things or running down and fixing issues.

This one reason we at Win-Vector LLC have been working on low-dependency R packages for data analysis. We don’t intend on controlling the whole analysis stack (that would be unethical), but we do intend to be in good position to fix things for our partners and clients. The bulk of our system’s utility comes from external systems such as R itself, the data.table package, and Rcpp. So we must (and hopefully do) give credit and thanks.

Also, not all dependencies are equal. So we have had to avoid some popular packages with unstable APIs (a history of breaking changes) and high historic error rates (a history of complexity and adding features over fixing things).

Again, dependencies are but one measure of quality and at best an approximation. But let’s take a look at some of our packages through this lens.

And almost to make the point our package where we relaxed the above discipline right now has CRAN-flagged issue (“significant warnings”), that we can not fix as the issue is in fact from one of the dependencies.

WVPlots

NewImage

The issue is likely from ggplot2, which itself is likely picking up issues and errors from dplyr, tibble, and rlang (a few of ggplot2‘s dependencies that currently have detected, yet unfixed issues on CRAN). And these packages are likely picking up issues from their direct and indirect dependencies.

Now these issues are probably not serious, as if they were there would be a great panic motivating teams to fix them (this is a neat example of survivorship bias, visible acute problems attract enough attention to be fixed quickly- but often subtle chronic issues can live a long time). But the point is: we have no lever to fix them on our end.

To leave a comment for the author, please follow the link and comment on their blog: R – Win-Vector Blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)