New features in the checkpoint package, version 0.4.0
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
by Andrie de Vries
In 2014 we introduced the checkpoint package for reproducible research. This package makes it easy to use R package versions that existed on CRAN at a given date in the past, and to use varying package versions with different projects. Previous blog posts include:
- Introducing the Reproducible R Toolkit and the checkpoint package
- An update to the checkpoint package
- New features in checkpoint v0.3.15 now on CRAN
On April 12, 2017, we published version 0.4.0 of checkpoint to CRAN.
The checkpoint()
function enables reproducible research by managing your R package versions. These packages are downloaded into a local .checkpoint
folder. If you use checkpoint()
for many projects, these local packages can consume some storage space, and this update introduces functions to manage your snapshots. In this post I review:
- Managing local archives:
checkpointArchives()
: list checkpoint archives on disk.checkpointRemove()
: remove checkpoint archive from disk.getAccessDate()
: returns the date the snapshot was last accessed.
- Other:
unCheckpoint()
: reset .libPaths to the user library to undo the effect of checkpoint().
Setting up an example project
For illustration, set up a script referencing a single package:
library(MASS) hist(islands) truehist(islands)
Next, create the checkpoint:
dir.create(file.path(tempdir(), ".checkpoint"), recursive = TRUE) ## Create a checkpoint by specifying a snapshot date library(checkpoint) checkpoint("2015-04-26", project = tempdir(), checkpointLocation = tempdir())
Working with checkpoint archive snapshots
You can query the available snapshots on disk using the checkpointArchives()
function. This returns a vector of snapshot folders.
# List checkpoint archives on disk. checkpointArchives(tempdir()) ## [1] "2015-04-26"
You can get the full paths by including the argument full.names=TRUE
:
checkpointArchives(tempdir(), full.names = TRUE) ## [1] "C:/Users/adevries/AppData/Local/Temp/RtmpcnciXd/.checkpoint/2015-04-26"
Working with access dates
Every time you use checkpoint()
the function places a small marker in the snapshot archive with the access date. In this way you can track when was the last time you actually used the snapshot archive.
# Returns the date the snapshot was last accessed. getAccessDate(tempdir()) ## C:/Users/adevries/AppData/Local/Temp/RtmpcnciXd/.checkpoint/2015-04-26 ## "2017-04-12"
Removing a snapshot from local disk
Since the date of last access is tracked, you can use this to manage your checkpoint archives. The function checkpointRemove()
will delete archives from disk. You can use this function in multiple ways. For example, specify a specific archive to remove:
# Remove singe checkpoint archive from disk. checkpointRemove("2015-04-26")
You can also remove a range of snapshot archives older (or more recent) than a snapshot date
# Remove range of checkpoint archives from disk. checkpointRemove("2015-04-26", allSinceSnapshot = TRUE) checkpointRemove("2015-04-26", allUntilSnapshot = = TRUE)
Finally, you can remove all snapshot archives that have not been accessed since a given date:
# Remove snapshot archives that have not been used recently checkpointRemove("2015-04-26", notUsedSince = TRUE)
Reading the checkpoint log file
One of the side effects of checkpoint()
is to create a log file that contains information about packages that get downloaded, as well as the download size. This file is stored in the checkpoint root folder, and is a csv file with column names, so you can read this with your favourite R function or other tools.
dir(file.path(tempdir(), ".checkpoint")) ## [1] "2015-04-26" "checkpoint_log.csv" "R-3.3.3"
Inspect the log file:
log_file ## timestamp snapshotDate pkg bytes ## 1 2017-04-12 15:05:12 2015-04-26 MASS 1084392
Resetting the checkpoint
In older versions of checkpoint()
the only way to reset the effect of checkpoint()
was to restart your R session. In v0.3.20 and above, you can use the function unCheckpoint()
. This will reset your .libPaths
to the user folder.
.libPaths() ## [1] "C:/Users/adevries/AppData/Local/Temp/RtmpcnciXd/.checkpoint/2015-04-26/lib/x86_64-w64-mingw32/3.3.3" ## [2] "C:/Users/adevries/AppData/Local/Temp/RtmpcnciXd/.checkpoint/R-3.3.3" ## [3] "C:/R/R-33~1.3/library" Now use `unCheckpoint()` to reset your library paths # Note this is still experimental unCheckpoint() .libPaths() ## [1] "C:\\Users\\adevries\\Documents\\R\\win-library" ## [2] "C:/R/R-33~1.3/library"
How to obtain and use checkpoint
Version 0.4.0 of the checkpoint package is available on CRAN now, so you can install it with:
install.packages("checkpoint", repos="https://cloud.r-project.org")
The above command works both for CRAN R, and also for Microsoft R Open (which comes bundled with an older version of checkpoint). For more information on checkpoint, see the vignette Using checkpoint for reproducible research.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.