Rperform: Obtaining quantitative metrics across R package versions

June 26, 2015

(This article was first published on Tech and Mortals » R, and kindly contributed to R-bloggers)

After almost a month since the start of the official coding period, the package Rperform is now ready with its initial suite of functions. Building on the proof-of-concept code provide by Toby Dylan Hocking, my mentor, I have written a set of functions for obtaining and plotting time metrics for individual test files over subsequent git versions, and obtain and plot time metrics for files across two given branches. This part had been relatively easy and straight-forward as compared to obtaining the memory metrics, which are work-in-progress at the moment.

The memory metrics posed an issue due to the memory management methodology of R (involving gc()) which didn’t allow for accurate capture of the memory in use at a particular point of time. However, with Toby’s help and some insights from Hadley, I have been able to obtain the memory metrics for individual test files. The trick was to start a new process through the command-line for each individual commit rather than try to obtain them through the current R process. The code for obtaining memory usage for individual testthat blocks within the files is still being worked out.

The examples for some of the functions from the package are given below.

The following example illustrates the use of the Rperform::plot_time() function on the git repository
of the package stringr.

> setwd("~/stringr")
> library(Rperform)
> get_times(test_path = "./tests/testthat/test-extract.r", num_commits = 10)


The following example illustrates the use of Rperform::mem_compare() on a test file from the package, PeakSegJoint

> setwd("~/PeakSegJoint")
> Rperform::mem_compare(test_path = "./tests/testthat/test-likelihood.R", num_commits = 5)
           file_name swap_mb leak_mb         msg_val           date_time
11 test-likelihood.R  36.948  34.732 gc and done mes 2015-06-15 13:59:02
12 test-likelihood.R  40.592  35.500 gc and done mes 2015-06-15 13:59:02
13 test-likelihood.R  40.592  35.516 gc and done mes 2015-06-15 13:59:02
21 test-likelihood.R  39.800  35.500 link to example 2015-06-12 12:35:16
22 test-likelihood.R  40.592  35.500 link to example 2015-06-12 12:35:16
23 test-likelihood.R  38.744  35.500 link to example 2015-06-12 12:35:16
31 test-likelihood.R  37.848  34.604    x axis title 2015-06-12 12:21:26
32 test-likelihood.R  40.552  37.308    x axis title 2015-06-12 12:21:26
33 test-likelihood.R  39.696  34.836    x axis title 2015-06-12 12:21:26
41 test-likelihood.R  39.264  35.492 return error ev 2015-06-12 12:08:45
42 test-likelihood.R  40.584  35.492 return error ev 2015-06-12 12:08:45
43 test-likelihood.R  38.736  35.492 return error ev 2015-06-12 12:08:45
51 test-likelihood.R  40.592  35.500 chromStart1 and 2015-06-12 12:07:59
52 test-likelihood.R  38.744  35.516 chromStart1 and 2015-06-12 12:07:59
53 test-likelihood.R  39.800  35.500 chromStart1 and 2015-06-12 12:07:59
Apart from obtaining metrics for simply the commit log of a repository in its current state, another goal has been to provide useful insight to developers across git branches. Currently, the time metrics functionalities have been completed. The plot_brancht() function allows the developer to compare timings across different branches. The below diagram and examples display the working of the function.compare_branches
> setwd("~/stringr")
> Rperform::plot_btimes(test_path = "./tests/testthat/test-extract.r", branch1 = "stringi", branch2 = "master")


The vertical line divides the results from branch1 commits and branch2 commits, with the former being on the left hand side. In the above example, there existed only one commit in branch1 after the latest common commit.

Next set of targets include completion of memory functionality, writing an initial set of tests and integration with the Travis-CI build system among other things.

The package in its current form can be found on Github (Rperform).

To leave a comment for the author, please follow the link and comment on their blog: Tech and Mortals » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.


Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)