New features in the checkpoint package, version 0.4.0

[This article was first published on Revolutions, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

by Andrie de Vries

In 2014 we introduced the checkpoint package for reproducible research. This package makes it easy to use R package versions that existed on CRAN at a given date in the past, and to use varying package versions with different projects. Previous blog posts include:

On April 12, 2017, we published version 0.4.0 of checkpoint to CRAN.

The checkpoint() function enables reproducible research by managing your R package versions. These packages are downloaded into a local .checkpoint folder. If you use checkpoint() for many projects, these local packages can consume some storage space, and this update introduces functions to manage your snapshots. In this post I review:

  • Managing local archives:
    • checkpointArchives(): list checkpoint archives on disk.
    • checkpointRemove(): remove checkpoint archive from disk.
    • getAccessDate(): returns the date the snapshot was last accessed.
  • Other:
    • unCheckpoint(): reset .libPaths to the user library to undo the effect of checkpoint().

Setting up an example project

For illustration, set up a script referencing a single package:

library(MASS)
hist(islands)
truehist(islands)

Next, create the checkpoint:

dir.create(file.path(tempdir(), ".checkpoint"), recursive = TRUE)
## Create a checkpoint by specifying a snapshot date
library(checkpoint)
checkpoint("2015-04-26", project = tempdir(), checkpointLocation = tempdir())

Working with checkpoint archive snapshots

You can query the available snapshots on disk using the checkpointArchives() function. This returns a vector of snapshot folders.

# List checkpoint archives on disk.
checkpointArchives(tempdir())

## [1] "2015-04-26"

You can get the full paths by including the argument full.names=TRUE:

checkpointArchives(tempdir(), full.names = TRUE)

## [1] "C:/Users/adevries/AppData/Local/Temp/RtmpcnciXd/.checkpoint/2015-04-26"

Working with access dates

Every time you use checkpoint() the function places a small marker in the snapshot archive with the access date. In this way you can track when was the last time you actually used the snapshot archive.

# Returns the date the snapshot was last accessed.
getAccessDate(tempdir())

## C:/Users/adevries/AppData/Local/Temp/RtmpcnciXd/.checkpoint/2015-04-26 
##                                                           "2017-04-12"

Removing a snapshot from local disk

Since the date of last access is tracked, you can use this to manage your checkpoint archives. The function checkpointRemove() will delete archives from disk. You can use this function in multiple ways. For example, specify a specific archive to remove:

# Remove singe checkpoint archive from disk.
checkpointRemove("2015-04-26")

You can also remove a range of snapshot archives older (or more recent) than a snapshot date

# Remove range of checkpoint archives from disk.
checkpointRemove("2015-04-26", allSinceSnapshot = TRUE)
checkpointRemove("2015-04-26", allUntilSnapshot =  = TRUE)

Finally, you can remove all snapshot archives that have not been accessed since a given date:

# Remove snapshot archives that have not been used recently
checkpointRemove("2015-04-26", notUsedSince = TRUE)

Reading the checkpoint log file

One of the side effects of checkpoint() is to create a log file that contains information about packages that get downloaded, as well as the download size. This file is stored in the checkpoint root folder, and is a csv file with column names, so you can read this with your favourite R function or other tools.

dir(file.path(tempdir(), ".checkpoint"))

## [1] "2015-04-26"         "checkpoint_log.csv" "R-3.3.3"

Inspect the log file:

log_file 
##             timestamp snapshotDate  pkg   bytes
## 1 2017-04-12 15:05:12   2015-04-26 MASS 1084392

Resetting the checkpoint

In older versions of checkpoint() the only way to reset the effect of checkpoint() was to restart your R session. In v0.3.20 and above, you can use the function unCheckpoint(). This will reset your .libPaths to the user folder.

.libPaths()

## [1] "C:/Users/adevries/AppData/Local/Temp/RtmpcnciXd/.checkpoint/2015-04-26/lib/x86_64-w64-mingw32/3.3.3"
## [2] "C:/Users/adevries/AppData/Local/Temp/RtmpcnciXd/.checkpoint/R-3.3.3"                                
## [3] "C:/R/R-33~1.3/library"

Now use `unCheckpoint()` to reset your library paths
# Note this is still experimental
unCheckpoint()
.libPaths()

## [1] "C:\\Users\\adevries\\Documents\\R\\win-library"
## [2] "C:/R/R-33~1.3/library"

How to obtain and use checkpoint

Version 0.4.0 of the checkpoint package is available on CRAN now, so you can install it with:

install.packages("checkpoint", repos="https://cloud.r-project.org")

The above command works both for CRAN R, and also for Microsoft R Open (which comes bundled with an older version of checkpoint). For more information on checkpoint, see the vignette Using checkpoint for reproducible research.

To leave a comment for the author, please follow the link and comment on their blog: Revolutions.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)