Use GitHub Vulnerability Alerts to Keep Users of Your R Packages Safe

[This article was first published on R – rud.is, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Despite their now inherent evil status, GitHub has some tools other repository aggregators do not. One such tool is the free vulnerability alert service which will scan repositories for outdated+vulnerable dependencies.

Now, “R” is nowhere near a first-class citizen in the internet writ large, including software development tooling (e.g. the Travis-CI and GitLab continuous integration recipes are community maintained vs a first-class/supported offering). This also means that GitHub’s service will never check for nor alert when a pure R package has security issues, mostly due to the fact that there’s only a teensy few of us who even bother to check packages for issues once in a while and there’s no real way to report said issues into the CVE process easily (though I guess I could given that my $DAYJOB is an official CVE issuer), so the integrity & safety of the R package ecosystem is still in the “trust me, everything’s ????!!” state. Given that, any extra way to keep even some R packages less insecure is great.

So, right now you’re thinking “you click-baited us with a title that your lede just said isn’t possible…WTHeck?!.

It’s true that GitHub does not consider R a first-class citizen, but it does support Java and:

    available.packages() %>% 
      dplyr::as_data_frame() %>% 
      tidyr::separate_rows(Imports, sep=",[[:space:]]*") %>% # we really just
      tidyr::separate_rows(Depends, sep=",[[:space:]]*") %>% # need these two
      tidyr::separate_rows(Suggests, sep=",[[:space:]]*") %>%
      tidyr::separate_rows(Enhances, sep=",[[:space:]]*") %>%
      dplyr::select(Package, Imports, Depends) %>% 
      filter(
        grepl("rJava", Imports) | grepl("rJava", "Depends") | 
          grepl("Suggests", Imports) | grepl("Enhances", "Depends")
      ) %>% 
      dplyr::distinct(Package) %>% 
      dplyr::summarise(total_pkgs_using_rjava = n())
    ## # A tibble: 1 x 1
    ##   total_pkgs_using_rjava
    ##                    <int>
    ## 1                     66

according to ☝ there are 66 CRAN packages that require rJava, seven of which explicitly provide only JARs (a compressed directory tree of supporting Java classes). There are more CRAN-unpublished rJava-based projects on GitLab & GitHub, but it’s likely that public-facing rJava packages that include or depend on public JAR-dependent projects still number less than ~200. Given the now >13K packages in CRAN, this is a tiny subset but with the sorry state of R security, anything is better than nothing.

Having said that, one big reason (IMO) for the lack of Java-wrapped CRAN or “devtools”-only released rJava-dependent packages it that it’s 2018 and you still have better odds of winning a Vegas-jackpot than you do getting rJava to work on your workstation in less than 4 tries and especially after an OS upgrade. That’s sad since there are many wonderful, solid and useful Java libraries that would be super-handy for many workflows yet most of us (I’m including myself) package-writers prefer to spin wheels to get C++ or Rust libraries working with R than try to make it easier for regular R users to tap into that rich Java ecosystem.

But, I digress.

For the handful of us that do write and use rJava-based packages, we can better serve our userbase by deliberately putting those R+Java repos on GitHub. Now, I hear you. They’re evil and by doing this one of the most evil corporations on the planet can make money with your metadata (and, possibly just blatantly steal your code for use in-product without credit) but I’ll give that up on a case-by-case basis to make it easier to keep users safe.

Why will this enhance safety? Go take a look at one of my non-CRAN rJava-backed packages: pdfbox????. It has this awesome “in-your-face” security warning banner:

The vulnerability is CVE-2018-11797 which is baseline computed to be “high severity” with a the following specific weakness: In Apache PDFBox 1.8.0 to 1.8.15 and 2.0.0RC1 to 2.0.11, a carefully crafted PDF file can trigger an extremely long running computation when parsing the page tree.. So, it’s a process denial of service vulnerability. You’ll also note I haven’t updated the JARs yet (mostly since it’s not a code-execution vulnerability).

I knew about this 28 days ago (I’ve been incredibly busy and there’s alot of blather required to talk about it, hence the delay in blogging) thanks to the GitHub service and will resolve it when I get some free time over the Thanksgiving break. I received an alert for this, there are hooks for security alerts (so one can auto-create an issue) and there’s a warning for users and any of them could file an issue to let me know it’s super-important to them that I get it fixed (or they could be super-awesome and file a PR :-).

FIN

The TLDR is (first) a note — to package authors — who use rJava to bite the GitHub bullet and take advantage of this free service; and, (second) — to users — to encourage use of this service by authors of packages you use and to keep a watchful eye out for any security alerts for code you depend on to get things done.

A (perhaps) third and final note is for all of us to be to continually mindful about the safety & integrity of the R package ecosystem and do what we can to keep moving it forward.

To leave a comment for the author, please follow the link and comment on their blog: R – rud.is.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)