grouper: An R package for Optimal Group Assignment

[This article was first published on R-posts.com, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Introduction

Universities are increasingly using collaborative learning pedagogies, which can benefit learners through deeper understanding of course content and teamwork skills. However, the realisation of these sought-after benefits depend on how educators assign learners to groups.

Educators have formulated various mathematical models to perform this assignment. Some have developed developed models that prioritised maximising students’ project preferences. Others developed a model that prioritised students’ preferences, group sizes and group composition. Yet other models address related, but distinct, problems such as assigning students to elective courses or incorporating staff workload into student-to-project supervisor assignments.

Whichever approach is used, it is apparent that there is a need for an algorithmic solution for the assignment. This would ease the burden on the instructor, while providing an objective procedure for the assignment. Our contribution is an R package grouper that offers two flexible group allocation strategies.

Optimisation Models

grouper provides two distinct integer linear programming optimisation models.

library(grouper)
library(ompr)
library(ompr.roi)
library(ROI.plugin.glpk)

Preference-Based Assignment

The Preference-Based Assignment (PBA) model allows educators to assign student groups to topics to maximise overall student preferences for those topics. The topics can be viewed as project titles. The model allows for repetitions of each project title. This formulation also allows each project team to comprise multiple sub-groups. This is useful in cases where the project requires teams with different functionality to work together, e.g. where one team works on a front-end while the other develops a back-end model.

To execute the optimisation routine, an instructor prepares:

      1. A group composition table listing the member students within each self-formed group
      2. A preference matrix containing the preference that each self-formed group has for each topic.
      3. A YAML file defining the remaining parameters of the model.

Examples

Consider the following simple dataset with 8 students:
pba_gc_ex002
#>   id grouping
#> 1  1        1
#> 2  2        1
#> 3  3        2
#> 4  4        2
#> 5  5        3
#> 6  6        3
#> 7  7        4
#> 8  8        4

Each student is in a self-formed group of size 2, indicated via the grouping column. Suppose that, for this set of students, the instructor wishes to assign students into two topics, with each topic having two sub-groups. This requires the preference matrix to have 4 columns – one for each topic-subgroup combination. Remember that the ordering of topics/subtopics in the preference matrix should be:

Topic1-Subtopic1, Topic2-Subtopic1, Topic1-Subtopic2, Topic2-Subtopic2

Thus there should be 4 rows in the preference matrix – one for each self-formed group.

pba_prefmat_ex002
#>      col1 col2 col3 col4
#> [1,]    4    3    2    1
#> [2,]    3    4    2    1
#> [3,]    1    2    4    3
#> [4,]    1    2    3    4

The YAML file for this model contains the following parameters:

n_topics: 2
B: 2
R: 1
nmin: 2
nmax: 2
rmin: 1
rmax: 1

B corresponds to the number of sub-topics per topic, while rmin and rmax denote the minimum and maximum number of repetitions of each topic. nmin and nmax denote the minimum and maximum number of members in each sub-topic group.

It is possible to assign each self-formed group to its optimal choice of topic-subtopic combination. In our solution, we should see that group 1 is assigned to subtopic 1 of topic 1, group 2 is assigned to sub-topic 1 of topic 2, and so on.

df_ex002_list <- extract_student_info(pba_gc_ex002, "preference", 
                                     self_formed_groups = 2, 
                                     pref_mat = pba_prefmat_ex002)
yaml_ex002_list <- extract_params_yaml(system.file("extdata", 
                                         "pba_params_ex002.yml",  
                                          package = "grouper"),
                                      "preference")
m2 <- prepare_model(df_ex002_list, yaml_ex002_list, "preference")
result2 <- solve_model(m2, with_ROI(solver="glpk"))

assign_groups(result2, assignment = "preference", 
              dframe=pba_gc_ex002, yaml_ex002_list, 
              group_names="grouping")
#>   topic2 subtopic rep group size
#> 1      1        1   1     1    2
#> 2      2        1   1     2    2
#> 3      1        2   1     3    2
#> 4      2        2   1     4    2

Diversity-Based Assignment

The Diversity-Based Assignment (DBA) model enables educators to assign students to groups and topics with the dual, but weighted, aims of maximising diversity (based on student attributes) within groups and balancing specific skill levels across different groups.

To execute the DBA optimisation routine, the instructor prepares:

      1. A group composition table containing:
        1. the member students within each self-formed group,
        2. the demographics that will be used to compute pairwise dissimilarity between students, and
        3. a numeric measure of each student’s skill.
      2. A YAML file defining the remaining parameters of the model.

Examples

Consider the following dataset, that comes with the package. There are 4 students in total.
dba_gc_ex001
#>   id major skill groups
#> 1  1     A     1      1
#> 2  2     A     1      2
#> 3  3     B     3      3
#> 4  4     B     3      4

It is intuitive that an assignment into two groups of size two, based on the diversity of majors alone, should assign students 1 and 2 into the first group and the remaining two students into another group.

The corresponding YAML dba_gc_ex001.yml file for this exercise consists of the following lines:

n_topics:  2
R:  1
nmin: 2
nmax: 2
rmin: 1
rmax: 1

To run the assignment, we can use the following commands. We can use either the gurobi solver, or the glpk solver for this example. Both are equally fast.

# Indicate appropriate columns using integer ids.
df_ex001_list <- extract_student_info(dba_gc_ex001, "diversity",
                                      demographic_cols = 2, 
                                      skills = 3, 
                                      self_formed_groups = 4)

yaml_ex001_list <- extract_params_yaml(system.file("extdata", 
                                         "dba_params_ex001.yml",  
                                         package = "grouper"),
                                       "diversity")
m1 <- prepare_model(df_ex001_list, yaml_ex001_list,
                    assignment="diversity",w1=0.5, w2=0.5)

result3 <- solve_model(m1, with_ROI(solver="glpk"))
assign_groups(result3, assignment = "diversity", 
              dframe=dba_gc_ex001, 
              group_names="groups")
#>   topic rep group id major skill
#> 1     1   1     2  2     A     1
#> 2     1   1     3  3     B     3
#> 3     2   1     1  1     A     1
#> 4     2   1     4  4     B     3

We can see that students 2 and 3 have been assigned to topic 1, repetition 1. Students 1 and 4 have been assigned to topic 2, repetition 1. w1 and w2 both have weights 0.5, which means the skills and demographic inputs are given equal weight in the optimisation.

At present, the routines use the daisy function from the cluster package to compute a pairwise dissimilarity matrix between students. However, it is also possible to supply your own custom dissimilarity matrix. Consider the following dataset of 4 students:

dba_gc_ex003
#>   year   major self_groups id
#> 1    1    math           1  1
#> 2    2 history           2  2
#> 3    3    dsds           3  3
#> 4    4    elts           4  4

Now consider a situation where we wish to consider years 1 and 2 different from years 3 and 4, and math and dsds (STEM majors) to be different from elts and history (non-STEM majors). For each difference, we assign a score of 1. This means that students 1 and 2 would have a dissimilarity score of 1 due to their difference in majors. Students 1 and 3 would also have a score of 1, but due to their difference in years. Students 1 and 4 would have score of 2, due to their differences in majors and in years. The overall dissimilarity matrix would be:

d_mat <- matrix(c(0, 1, 1, 2,
                  1, 0, 2, 1,
                  1, 2, 0, 1,
                  2, 1, 1, 0), nrow=4, byrow = TRUE)

To run the optimisation for this model, we can execute the following code:

df_ex003_list <- extract_student_info(dba_gc_ex003, "diversity",
                                       skills = NULL,
                                       self_formed_groups = 3,
                                       d_mat=d_mat)
yaml_ex003_list <- extract_params_yaml(system.file("extdata",   
                                         "dba_params_ex003.yml",
                                         package = "grouper"), 
                                       "diversity")
m3 <- prepare_model(df_ex003_list, yaml_ex003_list, w1=1.0, w2=0.0)
result <- solve_model(m3, with_ROI(solver="glpk")

assign_groups(result, "diversity", dba_gc_ex003,
              group_names="self_groups")
#>   topic rep group year   major id
#> 1     1   1     1    1    math  1
#> 2     1   1     4    4    elts  4
#> 3     2   1     2    2 history  2
#> 4     2   1     3    3    dsds  3

As you can see, the members of the two groups have maximal difference between them – they differ in terms of their year, and in terms of their major. Notice that we specified

skills = NULL

and

w2 = 0.0

This ensures that no skills columns were taken into account in this optimisation.

Gurobi Optimiser

While the routines above use the glpk optimiser, we recommend using the Gurobi optimiser. The latter is a commercial software that runs to completion much faster than glpk. For more information, please refer to this website. Note that academic licenses are available from Gurobi.

Shiny Applications

The package provides numerous options for each of the two optimisation models. However, there are also two shiny applications included with the package. They may be useful if one only needs a straightforward group assignment. 

To run the DBA shiny app, the following code will suffice:
library(shiny)
runApp(appDir=system.file("shiny", "dbaWebApp", package="grouper"))

# Analogous code for PBA app:
# runApp(appDir=system.file("shiny", "pbaWebApp", package="grouper"))

Here is a screen shot of the diversity-based shiny application.



The system folders with the shiny apps also contain example csv files for use with the apps.

More Details

The two optimisation models are flexibly parametrised. Here are some of the features:
    • Define the number of repetitions for each topic.
    • Define the max. and min. number of group members for each topic.
The vignettes also contain the precise mathematical formulation of the optimisation models. For full details, please refer to these links:

grouper: An R package for Optimal Group Assignment was first posted on April 29, 2026 at 6:18 am.
To leave a comment for the author, please follow the link and comment on their blog: R-posts.com.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)