HTTPS for CRAN: how and why

[This article was first published on OpenCPU, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

opencpu logo

R gained some basic support for https in version 3.2.0 (see NEWS) via the method = "libcurl" argument in base functions download.file and url. The global option download.file.method is used to make this the default.

Unfortunately the implementation has a few limitations: there is no way to set request options (authentication, proxy, headers, TLS options, etc) and the functions do not expose an http status code or response headers. Because they also do not raise an error when the request fails with an http error (as do the other download methods), this leaves you to guess if the retrieved content is what you were expecting or an error page.

# Raises an error
download.file("http://httpbin.org/status/418", tempfile(), method = "internal")

# Does not raise an error
download.file("http://httpbin.org/status/418", tempfile(), method = "libcurl")

# What it should do
library(curl)
curl_download("http://httpbin.org/status/418", tempfile())

Anyway it is good enough for downloading static files from public servers, which is all we need for now.

CRAN and libcurl

Because install.packages and friends wrap around download.file, we can use this new feature to download R packages from CRAN via https. None of the currently available CRAN servers seems to support https, so I created a demo server at https://cran.opencpu.org. This is not a real mirror, it is just a https proxy to the US mirror.

# Install a package over https
install.packages("ggplot2", repos = "https://cran.opencpu.org", method = "libcurl")

Use a script like this to opt-in globally on machines where libcurl is available:

# Enable CRAN https everywhere
if(capabilities("libcurl")){
  options(repos = "https://cran.opencpu.org", download.file.method = "libcurl")
}

Hopefully the admins in Vienna will at some point enable https for the main cran server in the same way they have done for r-forge (which is literally the neighborhing ip address).

Why CRAN and https?

Using https can stop some, but not all, MITM attacks. Encrypting the connection with the CRAN server prevents intermediate parties such as your ISP, (anti)virus, or any other user on your network from snooping or tampering with the connection. When it comes to CRAN, security is probably more of a concern than privacy, especially when using public networks on e.g. airports, coffee shops or campuses. It is easy for hackers or viruses to hijack wifi connections and inject malicious code or executables into unencrypted traffic. Using https guarantees that at least the connection between you and your CRAN mirror is secure.

Of course this does not fully guarantee the integrity of your download. You are basically putting your faith in the hands of your CRAN mirror (or the owner of the domain to be more specific). If the mirror server gets hacked, or somebody manages to tamper with the mirroring process itself (which is done using rsync without any encryption) packages can still get infected.

Linux distributions solve this problem by making package authors sign the checksum of the package with a private key. This signature is used to automatically verify the integrity of a download from the author’s public key before installation, regardless of how the package was obtained. Simon has implemented some of this for R in PKI but unfortunately this was never adopted by CRAN. But at least with https we can somewhat safely install R packages from within a coffee shop now, which solves the most urgent problem.

To leave a comment for the author, please follow the link and comment on their blog: OpenCPU.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)