grouper: An R package for Optimal Group Assignment
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Introduction
Universities are increasingly using collaborative learning pedagogies, which can benefit learners through deeper understanding of course content and teamwork skills. However, the realisation of these sought-after benefits depend on how educators assign learners to groups.Educators have formulated various mathematical models to perform this assignment. Some have developed developed models that prioritised maximising students’ project preferences. Others developed a model that prioritised students’ preferences, group sizes and group composition. Yet other models address related, but distinct, problems such as assigning students to elective courses or incorporating staff workload into student-to-project supervisor assignments.
Whichever approach is used, it is apparent that there is a need for an algorithmic solution for the assignment. This would ease the burden on the instructor, while providing an objective procedure for the assignment. Our contribution is an R package
grouper that offers two flexible group allocation strategies.Optimisation Models
grouper provides two distinct integer linear programming optimisation models.
library(grouper) library(ompr) library(ompr.roi) library(ROI.plugin.glpk)
Preference-Based Assignment
The Preference-Based Assignment (PBA) model allows educators to assign student groups to topics to maximise overall student preferences for those topics. The topics can be viewed as project titles. The model allows for repetitions of each project title. This formulation also allows each project team to comprise multiple sub-groups. This is useful in cases where the project requires teams with different functionality to work together, e.g. where one team works on a front-end while the other develops a back-end model.
To execute the optimisation routine, an instructor prepares:
-
-
- A group composition table listing the member students within each self-formed group
- A preference matrix containing the preference that each self-formed group has for each topic.
- A YAML file defining the remaining parameters of the model.
-
Examples
Consider the following simple dataset with 8 students:pba_gc_ex002 #> id grouping #> 1 1 1 #> 2 2 1 #> 3 3 2 #> 4 4 2 #> 5 5 3 #> 6 6 3 #> 7 7 4 #> 8 8 4
Each student is in a self-formed group of size 2, indicated via the grouping column. Suppose that, for this set of students, the instructor wishes to assign students into two topics, with each topic having two sub-groups. This requires the preference matrix to have 4 columns – one for each topic-subgroup combination. Remember that the ordering of topics/subtopics in the preference matrix should be:
Topic1-Subtopic1, Topic2-Subtopic1, Topic1-Subtopic2, Topic2-Subtopic2
Thus there should be 4 rows in the preference matrix – one for each self-formed group.
pba_prefmat_ex002 #> col1 col2 col3 col4 #> [1,] 4 3 2 1 #> [2,] 3 4 2 1 #> [3,] 1 2 4 3 #> [4,] 1 2 3 4
The YAML file for this model contains the following parameters:
n_topics: 2 B: 2 R: 1 nmin: 2 nmax: 2 rmin: 1 rmax: 1
B corresponds to the number of sub-topics per topic, while rmin and rmax denote the minimum and maximum number of repetitions of each topic. nmin and nmax denote the minimum and maximum number of members in each sub-topic group.
It is possible to assign each self-formed group to its optimal choice of topic-subtopic combination. In our solution, we should see that group 1 is assigned to subtopic 1 of topic 1, group 2 is assigned to sub-topic 1 of topic 2, and so on.
df_ex002_list <- extract_student_info(pba_gc_ex002, "preference",
self_formed_groups = 2,
pref_mat = pba_prefmat_ex002)
yaml_ex002_list <- extract_params_yaml(system.file("extdata",
"pba_params_ex002.yml",
package = "grouper"),
"preference")
m2 <- prepare_model(df_ex002_list, yaml_ex002_list, "preference")
result2 <- solve_model(m2, with_ROI(solver="glpk"))
assign_groups(result2, assignment = "preference",
dframe=pba_gc_ex002, yaml_ex002_list,
group_names="grouping")
#> topic2 subtopic rep group size
#> 1 1 1 1 1 2
#> 2 2 1 1 2 2
#> 3 1 2 1 3 2
#> 4 2 2 1 4 2
Diversity-Based Assignment
The Diversity-Based Assignment (DBA) model enables educators to assign students to groups and topics with the dual, but weighted, aims of maximising diversity (based on student attributes) within groups and balancing specific skill levels across different groups.
To execute the DBA optimisation routine, the instructor prepares:
-
-
- A group composition table containing:
- the member students within each self-formed group,
- the demographics that will be used to compute pairwise dissimilarity between students, and
- a numeric measure of each student’s skill.
- A YAML file defining the remaining parameters of the model.
- A group composition table containing:
-
Examples
Consider the following dataset, that comes with the package. There are 4 students in total.dba_gc_ex001 #> id major skill groups #> 1 1 A 1 1 #> 2 2 A 1 2 #> 3 3 B 3 3 #> 4 4 B 3 4
It is intuitive that an assignment into two groups of size two, based on the diversity of majors alone, should assign students 1 and 2 into the first group and the remaining two students into another group.
The corresponding YAML dba_gc_ex001.yml file for this exercise consists of the following lines:
n_topics: 2 R: 1 nmin: 2 nmax: 2 rmin: 1 rmax: 1
To run the assignment, we can use the following commands. We can use either the gurobi solver, or the glpk solver for this example. Both are equally fast.
# Indicate appropriate columns using integer ids.
df_ex001_list <- extract_student_info(dba_gc_ex001, "diversity",
demographic_cols = 2,
skills = 3,
self_formed_groups = 4)
yaml_ex001_list <- extract_params_yaml(system.file("extdata",
"dba_params_ex001.yml",
package = "grouper"),
"diversity")
m1 <- prepare_model(df_ex001_list, yaml_ex001_list,
assignment="diversity",w1=0.5, w2=0.5)
result3 <- solve_model(m1, with_ROI(solver="glpk"))
assign_groups(result3, assignment = "diversity",
dframe=dba_gc_ex001,
group_names="groups")
#> topic rep group id major skill
#> 1 1 1 2 2 A 1
#> 2 1 1 3 3 B 3
#> 3 2 1 1 1 A 1
#> 4 2 1 4 4 B 3
We can see that students 2 and 3 have been assigned to topic 1, repetition 1. Students 1 and 4 have been assigned to topic 2, repetition 1. w1 and w2 both have weights 0.5, which means the skills and demographic inputs are given equal weight in the optimisation.
At present, the routines use the daisy function from the cluster package to compute a pairwise dissimilarity matrix between students. However, it is also possible to supply your own custom dissimilarity matrix. Consider the following dataset of 4 students:
dba_gc_ex003 #> year major self_groups id #> 1 1 math 1 1 #> 2 2 history 2 2 #> 3 3 dsds 3 3 #> 4 4 elts 4 4
Now consider a situation where we wish to consider years 1 and 2 different from years 3 and 4, and math and dsds (STEM majors) to be different from elts and history (non-STEM majors). For each difference, we assign a score of 1. This means that students 1 and 2 would have a dissimilarity score of 1 due to their difference in majors. Students 1 and 3 would also have a score of 1, but due to their difference in years. Students 1 and 4 would have score of 2, due to their differences in majors and in years. The overall dissimilarity matrix would be:
d_mat <- matrix(c(0, 1, 1, 2,
1, 0, 2, 1,
1, 2, 0, 1,
2, 1, 1, 0), nrow=4, byrow = TRUE)
To run the optimisation for this model, we can execute the following code:
df_ex003_list <- extract_student_info(dba_gc_ex003, "diversity",
skills = NULL,
self_formed_groups = 3,
d_mat=d_mat)
yaml_ex003_list <- extract_params_yaml(system.file("extdata",
"dba_params_ex003.yml",
package = "grouper"),
"diversity")
m3 <- prepare_model(df_ex003_list, yaml_ex003_list, w1=1.0, w2=0.0)
result <- solve_model(m3, with_ROI(solver="glpk")
assign_groups(result, "diversity", dba_gc_ex003,
group_names="self_groups")
#> topic rep group year major id
#> 1 1 1 1 1 math 1
#> 2 1 1 4 4 elts 4
#> 3 2 1 2 2 history 2
#> 4 2 1 3 3 dsds 3
As you can see, the members of the two groups have maximal difference between them – they differ in terms of their year, and in terms of their major. Notice that we specified
skills = NULL
and
w2 = 0.0
This ensures that no skills columns were taken into account in this optimisation.
Gurobi Optimiser
While the routines above use the glpk optimiser, we recommend using the Gurobi optimiser. The latter is a commercial software that runs to completion much faster than glpk. For more information, please refer to this website. Note that academic licenses are available from Gurobi.Shiny Applications
The package provides numerous options for each of the two optimisation models. However, there are also two shiny applications included with the package. They may be useful if one only needs a straightforward group assignment.To run the DBA shiny app, the following code will suffice:
library(shiny)
runApp(appDir=system.file("shiny", "dbaWebApp", package="grouper"))
# Analogous code for PBA app:
# runApp(appDir=system.file("shiny", "pbaWebApp", package="grouper"))
Here is a screen shot of the diversity-based shiny application.

The system folders with the shiny apps also contain example csv files for use with the apps.
More Details
The two optimisation models are flexibly parametrised. Here are some of the features:-
- Define the number of repetitions for each topic.
- Define the max. and min. number of group members for each topic.
grouper: An R package for Optimal Group Assignment was first posted on April 29, 2026 at 6:18 am.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.