Finding possible class schedules

Fellgernon Bit - rstats

6 years ago

[This article was first published on Fellgernon Bit - rstats, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Over the weekend my brother wanted to figure out his class schedule for the next semester. He is a veterinary medicine and zootechnology student at UNAM. For this upcoming semester there is a set of classes he has to take and each has 8 or so instructor options. The website where he finds the class times lists about 8 pre-constructed class schedules. So he normally finds one he likes quite a bit, and then manually starts checking if he can change X instructor for Y for a given class. He does this based on the referalls and information he has gathered about the instructors, plus he factors in whether it’d be an overall better schedule. For example, he might prefer to have a packed Tuesday if that means that he can leave early on Friday and avoid classes on Saturday.

The problem is that it’s very easy to make a mistake. You (well he) gets all excited thinking that he’s found the perfect schedule. Only to then realize that there is a conflict between two classes. Or that the practical portion of a class is in a location one hour away from the university, meaning that the schedule he has selected won’t work. This process is very frustrating.

I was watching him and I started to think if I could help him with some code. Turns out that it was straightforward to write some code to find which options are valid. Once I wrote a test case, it took us like half an hour to fill out the data. I know that tomorrow is when he and his classmates start registering for classes, so this information might help his classmates.

First, I define some helper functions. These are rather straightforward but I’ll be using them later on. For example, dias() is just there for typing less.

## Helper functions
dias <- function(d, i) {
    paste0(d, i)
}

extract <- function(m, p) {
    m[[p]]
}

extract_names <- function(m, p) {
    names(m)[p]
}

Next comes the input information. I organized it in a set of nested list objects. The schedule is stored as a character vector. For example, Lucia Eliana’s class meets on Wednesdays (__M__iercoles in Spanish) from 9 to 11 am. I only keep the starting hours (9 and 10 am) because otherwise the code won’t detect valid opitons that include another class that starts at 11 am. For classes that are 1 hour away from the university, we included 1 hour before and 1 hour after the class.

## Input class/prof info and schedule
materias <- list(
    repro = list(
        'lucia eliana' = c(dias('m', 9:10), dias('j', 9:10), dias('v', 8:13)),
        'esquivel lacroix' = c(dias('l', 14:15), dias('ma', 12:18), dias('m', 14:15)),
        'ismael porras' = c(dias('l', 9:10), dias('m', 8:14), dias('j', 9:10)),
        'esquivel lacroix 2' = c(dias('l', 14:15), dias('m', 14:15), dias('v', 12:18)),
        'salvador galina' = c(dias('ma', 8:13), dias('j', 9:10), dias('v', 9:10)),
        'alberto balcazar' = c(dias('l', 15:16), dias('j', 12:18), dias('v', 14:15)),
        'ana myriam boeta' = c(dias('l', 8:13), dias('m', 10:11), dias('v', 9:10)),
        'rafael eduardo paz' = c(dias('l', 11:17), dias('j', 14:15), dias('v', 16:17)),
        'juan heberth' = c(dias('ma', 9:10), dias('m', 11:17), dias('v', 11:12)),
        'vicente octavio mejia' = c(dias('ma', 8:17))
    ),
    economia = list(
        'valentin efren espinoza' = c(dias('l', 8:10), dias('ma', 9:11)),
        'maria del pilar velazquez' = c(dias('l', 16:18), dias('m', 16:18)),
        'arturo alonso pesado' = c(dias('l', 11:13), dias('j', 11:13)),
        'laura mendez' = c(dias('ma', 13:15), dias('m', 18:20)),
        'laura mendez 2' = c(dias('j', 11:13), dias('v', 11:13)),
        'manuela garcia' = c(dias('l', 17:19), dias('v', 16:18)),
        'francisco alejandro' = c(dias('ma', 7:9), dias('j', 7:9)),
        'isaac reyes' = c(dias('m', 13:15), dias('v', 13:15)),
        'jose luis tinoco' = c(dias('ma', 12:14), dias('m', 9:11)),
        'isaac reyes 2' = c(dias('l', 14:16), dias('ma', 14:16))
        
    ),
    bacterianas = list(
        'jose luis gutierrez' = dias('s', 8:11),
        'rodrigo mena' = c(dias('ma', 18:19), dias('j', 18:19)),
        'beatriz arellano' = c(dias('l', 7:8), dias('ma', 10:11)),
        'de la pena, ramirez ortega' = c(dias('j', 18:19), dias('v', 18:19)),
        'ramirez ortega' = c(dias('m', 7:8), dias('j', 7:8)),
        'rodrigo mena 2' = c(dias('ma', 16:17), dias('m', 16:17)),
        'de la pena' = dias('s', 8:11),
        'efren diaz aparicio' = dias('s', 8:11),
        'lucia del carmen favila' = dias('s', 8:11)
    ),
    parasitarias = list(
        'cintli martinez' = c(dias('j', 16:17), dias('v', 18:20)),
        'osvaldo froylan' = c(dias('ma', 18:19), dias('j', 18:20)),
        'maria quintero, agustin perez' = c(dias('ma', 13:14), dias('m', 7:9)),
        'maria quintero' = c(dias('m', 16:18), dias('j', 16:17)),
        'evangelina romero' = c(dias('ma', 7:8), dias('v', 7:9)),
        'guadarrama 01' = c(dias('m', 7:8), dias('j', 11:13)),
        'guadarrama 03' = c(dias('ma', 13:15), dias('v', 7:8)),
        'guadarrama 04' = c(dias('l', 16:17), dias('ma', 18:20)),
        'guadarrama 05' = c(dias('l', 7:9), dias('j', 7:8))
    ),
    diagnosticas = list(
        '1701' = c(dias('l', 11:13), dias('m', 11:16)),
        '1702' = c(dias('j', 13:15), dias('v', 13:18)),
        '1703' = c(dias('ma', 7:9), dias('v', 8:13)),
        '1704' = c(dias('l', 18:20), dias('j', 13:18)),
        '1705' = c(dias('l', 11:13), dias('m', 7:11)),
        '1706' = c(dias('ma', 15:17), dias('m', 15:19)),
        '1707' = c(dias('ma', 10:12), dias('j', 10:15)),
        '1708' = c(dias('l', 18:20), dias('ma', 15:19)),
        '1709' = c(dias('l', 11:13), dias('j', 8:13)),
        '1711' = c(dias('j', 13:15), dias('v', 10:13))
    )
)

Now that the input information is complete, I use expand.grid() to find out all the different possible options.

## Get all the options
options <- expand.grid(lapply(materias, function(x) { seq_len(length(x))}))
dim(options)

## [1] 81000     5

There’s 81,00 of them including the classes that meet on Saturday. You can see why it’s a frustrating process to find which combination of classes work when doing this manually.

Next, I explore all these options to find those that are valid, meaning that none of the classes overlap. I do this by finding which options have no duplicated hours from the character vectors defined earlier. Nothing fancy.

valid <- apply(options, 1, function(input) {
    info <- mapply(extract, materias, input)
    !any(duplicated(unlist(info)))
})

Now that I have the valid options, I can find the names of the instructors for them. There’s 2,847 valid schedules in the end, out of the 81,000. That’s 3.5 percent!

valid_prof <- apply(options[valid, ], 1, function(input) {
    mapply(extract_names, materias, input)
})
ncol(valid_prof)

## [1] 2847

You can search the interactive version here to select only the options with a given instructor. For example, in my brother’s case there are 30 valid options once he decided to prioritize two instructors as shown in the non-interactive table below.

## Ideally, this code would create an interactive table, but it doesn't work for some reason:
#library('DT')
#datatable(t(valid_prof), options = list(pagingType='full_numbers', pageLength=10), rownames = FALSE)
valid_prof <- t(valid_prof)
rownames(valid_prof) <- seq_len(nrow(valid_prof))
top_options <- valid_prof[valid_prof[, 1] == 'lucia eliana' & valid_prof[, 2] %in% c('isaac reyes', 'isaac reyes 2'), ]
kable(top_options, format = 'markdown', row.names = TRUE)

	repro	economia	bacterianas	parasitarias	diagnosticas
5	lucia eliana	isaac reyes 2	jose luis gutierrez	cintli martinez	1701
11	lucia eliana	isaac reyes 2	rodrigo mena	cintli martinez	1701
14	lucia eliana	isaac reyes 2	beatriz arellano	cintli martinez	1701
19	lucia eliana	isaac reyes 2	ramirez ortega	cintli martinez	1701
25	lucia eliana	isaac reyes 2	de la pena	cintli martinez	1701
31	lucia eliana	isaac reyes 2	efren diaz aparicio	cintli martinez	1701
37	lucia eliana	isaac reyes 2	lucia del carmen favila	cintli martinez	1701
46	lucia eliana	isaac reyes 2	jose luis gutierrez	osvaldo froylan	1701
50	lucia eliana	isaac reyes 2	beatriz arellano	osvaldo froylan	1701
58	lucia eliana	isaac reyes 2	ramirez ortega	osvaldo froylan	1701
67	lucia eliana	isaac reyes 2	de la pena	osvaldo froylan	1701
76	lucia eliana	isaac reyes 2	efren diaz aparicio	osvaldo froylan	1701
85	lucia eliana	isaac reyes 2	lucia del carmen favila	osvaldo froylan	1701
123	lucia eliana	isaac reyes 2	jose luis gutierrez	guadarrama 01	1701
130	lucia eliana	isaac reyes 2	rodrigo mena	guadarrama 01	1701
134	lucia eliana	isaac reyes 2	beatriz arellano	guadarrama 01	1701
137	lucia eliana	isaac reyes 2	de la pena, ramirez ortega	guadarrama 01	1701
144	lucia eliana	isaac reyes 2	de la pena	guadarrama 01	1701
151	lucia eliana	isaac reyes 2	efren diaz aparicio	guadarrama 01	1701
158	lucia eliana	isaac reyes 2	lucia del carmen favila	guadarrama 01	1701
209	lucia eliana	isaac reyes 2	jose luis gutierrez	guadarrama 05	1701
217	lucia eliana	isaac reyes 2	rodrigo mena	guadarrama 05	1701
222	lucia eliana	isaac reyes 2	de la pena, ramirez ortega	guadarrama 05	1701
232	lucia eliana	isaac reyes 2	de la pena	guadarrama 05	1701
242	lucia eliana	isaac reyes 2	efren diaz aparicio	guadarrama 05	1701
252	lucia eliana	isaac reyes 2	lucia del carmen favila	guadarrama 05	1701
872	lucia eliana	isaac reyes 2	jose luis gutierrez	guadarrama 05	1704
885	lucia eliana	isaac reyes 2	de la pena	guadarrama 05	1704
894	lucia eliana	isaac reyes 2	efren diaz aparicio	guadarrama 05	1704
903	lucia eliana	isaac reyes 2	lucia del carmen favila	guadarrama 05	1704

Reproducibility

## Reproducibility info
library('devtools')
session_info()

## Session info --------------------------------------------------------------

##  setting  value                       
##  version  R version 3.3.0 (2016-05-03)
##  system   x86_64, mingw32             
##  ui       RStudio (0.99.902)          
##  language (EN)                        
##  collate  English_United States.1252  
##  tz       America/Mexico_City         
##  date     2016-08-02

## Packages ------------------------------------------------------------------

##  package   * version date       source        
##  devtools  * 1.12.0  2016-06-24 CRAN (R 3.3.1)
##  digest      0.6.9   2016-01-08 CRAN (R 3.3.0)
##  evaluate    0.9     2016-04-29 CRAN (R 3.3.0)
##  formatR     1.4     2016-05-09 CRAN (R 3.3.0)
##  highr       0.6     2016-05-09 CRAN (R 3.3.0)
##  knitr     * 1.13    2016-05-09 CRAN (R 3.3.0)
##  magrittr    1.5     2014-11-22 CRAN (R 3.3.0)
##  memoise     1.0.0   2016-01-29 CRAN (R 3.3.0)
##  rsconnect   0.4.3   2016-05-02 CRAN (R 3.3.0)
##  stringi     1.1.1   2016-05-27 CRAN (R 3.3.0)
##  stringr     1.0.0   2015-04-30 CRAN (R 3.3.0)
##  withr       1.0.2   2016-06-20 CRAN (R 3.3.1)

Want more?

Check other @jhubiostat student blogs at Bmore Biostats as well as topics on #rstats.

To leave a comment for the author, please follow the link and comment on their blog: Fellgernon Bit - rstats.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.