# groupdata2 version 1.1.0 released on CRAN

**R | Ludvig R. Olsen**, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

A few days ago, I released a new version of my R package, groupdata2, on CRAN. `groupdata2`

contains a set of functions for grouping data, such as creating balanced partitions and folds for cross-validation.

Version 1.1.0 adds the `balance()`

function for using up- and downsampling to balance the categories (e.g. classes) in your dataset. The main difference between existing up-/downsampling tools and `balance()`

is that it has methods for dealing with IDs. If, for instance, our dataset contains a number of participants with multiple measurements (rows) each, we might want to either keep all the rows of a participant or delete all of them, e.g. if we are measuring the effect of time and need all the timesteps for a meaningful analysis. `balance()`

currently has four methods for dealing with IDs. As it is a new function, I’m very open to ideas for improvements and additional functionality. Feel free to open an issue on github or send me a mail at *r-pkgs at ludvigolsen.dk*

`partition()`

and `fold()`

can now balance the groups (partitions/folds) by a numerical column. In short, the rows are ordered as smallest, largest, second smallest, second largest, etc. (I refer to this as *extreme pairing* in the documentation), and grouped as pairs. This seems to work pretty well, especially with larger datasets.

`fold()`

can now create multiple *unique* fold columns at once, e.g. for repeated cross-validation with cvms. As ensuring the uniqueness of the fold columns can take a while, I’ve added the option to run the column comparisons in parallel. You need to register a parallel backend first (E.g. with `doParallel::registerDoParallel`

).

The new helper function `differs_from_previous()`

finds values, or indices of values, that differ from the previous value by some threshold(s). It is related to the `"l_starts"`

method in `group()`

.

`groupdata2`

can be installed with:

CRAN:

`install.packages("groupdata2")`

Development version:

`devtools::install_github("ludvigolsen/groupdata2")`

**leave a comment**for the author, please follow the link and comment on their blog:

**R | Ludvig R. Olsen**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.