R Package Quality: Documentation

[This article was first published on The Jumping Rivers Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

This is part three of a five part series of related posts on validating R packages. Other posts in the series are:

In this post, we’ll take a closer look at package documentation and how it helps assess the “risky-ness” of a package The documentation score evaluates how complete and helpful a package’s documentation is. Package documentation comes in many guises. It could be a function examples, vignettes or even a website. While we don’t believe every package must have a website, vignettes, and examples. But the absence of all three usually points to weak documentation.

When validating R packages, documentation contributes around 15% to the total.

Score 1: Exported Objects Documentation

A score based on the proportion of exported objects that have documentation. For example, if we have ten functions, but only eight are documented, then the score would be 0.8. For all packages on CRAN, this is almost certainly 1, but for packages that are only available on GitHub, this may be less.

Score 2: Proportion of Help Pages with Examples

A score based on the proportion of help pages that have examples.

Score 3: NEWS file

A NEWS file is an indication of a development and release cycle. It helps users understand what has changed between versions. This detects the presence of a NEWS file. Of course, R packages make this interesting with NEWS, NEWS.md, inst/NEWS.md and/or Changelogs!

Score 4: Vignettes

Around 40% of CRAN packages have a single vignette, with only 10% having more than one vignette – we checked! For simplicity, this score is a simple binary metric, based on whether a package has any vignettes.

Score 5: Package Website

Does a package have an associated website? Ten or fifteen years ago, package websites were rare. Today, GitHub and GitLab make it easy for a package to host a website.

Score 6: NEWS updated to the Current version

The package’s NEWS file is outdated or missing, making it challenging to track recent changes, bug fixes, or updates. This lack of transparency may pose risks, as users are unable to verify whether critical updates have been implemented.

Summary

We can all agree that a package doesn’t need all of the components described above. It’s perfectly reasonable to have few examples, but very detailed vignettes. The important point is to investigate packages that have little documentation.

Examples

Using the packages from the previous blog post, and omitting scores where all packages scored 1, we have the following results

Package News Current Vignettes Examples
drat 1.00 1.00 0.43
microbenchmark 0.00 0.00 0.20
shinyjs 1.00 1.00 0.90
tibble 1.00 1.00 0.61
tsibble 0.00 1.00 0.82

All packages use source control, have a package website and provide documentation. The {microbenchmark} doesn’t have NEWS/Changelog. Similarly it’s missing vignettes. But recall it still has a high overall package score. The idea behind litmus, isn’t that a package must be perfect, but to take a pragmatic approach to scoring.

Oddly, the {tsibble} package does have a NEWS file, but it doesn’t mention the latest version, but I think this was an oversight.

For updates and revisions to this article, see the original post

To leave a comment for the author, please follow the link and comment on their blog: The Jumping Rivers Blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)