Version 0.2.2 is out and it has big changes!
Some new functionalities were added in R package fairmodels which is available on CRAN. In this blog, I will cover them in depth. The biggest changes concerned the fairness_check() function, parity_loss, and the addition of a new plot (not really but it hasn’t been properly introduced).
The boundary between fair and discriminatory decision is really arbitrary and subjective. Let’s say that people from towns A and B are identical and the amount of people applying for loans is big and equal in both towns. Can a bank loan system from town A, which has an approval rate of 65% for people from town A and 55% for people from town B, be called discriminatory? And if the people from town B would have an acceptance rate equal to 40%? Or maybe 20%?
As you can see this is very subjective. And it can be viewed by different people in distinct ways. It was hard to come up with some fixed and tangible border between fair and unfair. The only thing I could find was the four-fifths rule (also called 80% rule) that was introduced by EEOC. I also asked lawyers from Poland and Europe so they could point me to some other directives or rules in this region, such law does not exist. But 80% rule is a good tangible boundary!
Changes in fairmodels
To adhere to that rule some changes needed to be made. In fairness_check() instead of differences between privileged and unprivileged subgroups ratios were introduced. An ideal fair classifier would have metric rates that comply with this criterion:
where i denotes subgroup and epsilon by default equals 0.8 which results in checking the four-fifths rule. This results in a different scale in fairness check.
Now we will look at the difference in the fairness_object plot.
As a reminder, I wrote this code snippet.
library('fairmodels') library('DALEX') library('ranger') data("german") y_numeric <- as.numeric(german$Risk) -1 lm_model <- glm(Risk~., data = german, family=binomial(link="logit")) rf_model <- ranger::ranger(Risk ~., data = german, probability = TRUE, max.depth = 3, num.trees = 100, seed = 1) explainer_lm <- DALEX::explain(lm_model, data = german[,-1], y = y_numeric) explainer_rf <- DALEX::explain(rf_model, data = german[,-1], y = y_numeric) fobject <- fairness_check(explainer_lm, explainer_rf, protected = german$Sex, privileged = "male") plot(fobject)
This idea also solved one problem. If metric scores were very little, the difference would be barely visible. With ratios, scores around 0.01 are no longer a problem!
Parity loss also needed to be changed. Now it is a little bit more complicated but its properties remain unchanged.
The main point of introducing parity loss was to be able to:
- Aggregate metric scores among subgroups
- Have a positive value where 0 indicates no bias
- Comparing metric between each other
All of those criteria are met. This scary logarithm here is for 2 reasons. One is to convert the ideal ratio which is 1 to 0. Another is to have equal value when the ratio was inverted — then the logarithm would have the opposite sign, this is why there is a module of a logarithm. Pretty neat right?
This created another problem
When some ratio, let's say FPR for subgroup A is equal to 0, then when dividing it by anything (apart from 0) would result in 0. This is not a good property so, instead of that fairmodels returns NA . So how can we check if there is a problem with zeros? With metric_scores plot! It is really easy and intuitive. It just shows real metric scores for each subgroup and model! We can use fobject from earlier.
ms <- metric_scores(fobject) plot(ms)
This wraps up v0.2.2 of fairmodels.
Summary and future
This was a hard change for me as instead of elegant differences I had to introduce ratios. But I know that it is a change for the better. At the moment I am developing a fairness module for dalex in Python. In the future in both R and Python, there will be also support for regression and individual fairness along with other more innovative types of measuring bias.
Be sure to check out other (a bit outdated) blogs if you didn’t.
If you have any problems with the package, leave an issue here
If you are interested in other posts about explainable, fair, and responsible ML, follow #ResponsibleML on Medium.
In order to see more R related content visit http://www.r-bloggers.com/