groupdata2 version 1.1.0 released on CRAN

July 4, 2019

(This article was first published on R, and kindly contributed to R-bloggers)

A few days ago, I released a new version of my R package, groupdata2, on CRAN. groupdata2 contains a set of functions for grouping data, such as creating balanced partitions and folds for cross-validation.

Version 1.1.0 adds the balance() function for using up- and downsampling to balance the categories (e.g. classes) in your dataset. The main difference between existing up-/downsampling tools and balance() is that it has methods for dealing with IDs. If, for instance, our dataset contains a number of participants with multiple measurements (rows) each, we might want to either keep all the rows of a participant or delete all of them, e.g. if we are measuring the effect of time and need all the timesteps for a meaningful analysis. balance() currently has four methods for dealing with IDs. As it is a new function, I’m very open to ideas for improvements and additional functionality. Feel free to open an issue on github or send me a mail at r-pkgs at

partition() and fold() can now balance the groups (partitions/folds) by a numerical column. In short, the rows are ordered as smallest, largest, second smallest, second largest, etc. (I refer to this as extreme pairing in the documentation), and grouped as pairs. This seems to work pretty well, especially with larger datasets.

fold() can now create multiple unique fold columns at once, e.g. for repeated cross-validation with cvms. As ensuring the uniqueness of the fold columns can take a while, I’ve added the option to run the column comparisons in parallel. You need to register a parallel backend first (E.g. with doParallel::registerDoParallel).

The new helper function differs_from_previous() finds values, or indices of values, that differ from the previous value by some threshold(s). It is related to the “l_starts” method in group().

groupdata2 can be installed with:

Development version:

Indlægget groupdata2 version 1.1.0 released on CRAN blev først udgivet på .

To leave a comment for the author, please follow the link and comment on their blog: R. offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)