What is new in fairmodels?

Posted on September 22, 2020 by Jakub Wiśniewski in R bloggers | 0 Comments

[This article was first published on R in ResponsibleML on Medium, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Version 0.2.2 is out and it has big changes!

TL;DR

Some new functionalities were added in R package fairmodels which is available on CRAN. In this blog, I will cover them in depth. The biggest changes concerned the fairness_check() function, parity_loss, and the addition of a new plot (not really but it hasn’t been properly introduced).

Problem

The boundary between fair and discriminatory decision is really arbitrary and subjective. Let’s say that people from towns A and B are identical and the amount of people applying for loans is big and equal in both towns. Can a bank loan system from town A, which has an approval rate of 65% for people from town A and 55% for people from town B, be called discriminatory? And if the people from town B would have an acceptance rate equal to 40%? Or maybe 20%?

As you can see this is very subjective. And it can be viewed by different people in distinct ways. It was hard to come up with some fixed and tangible border between fair and unfair. The only thing I could find was the four-fifths rule (also called 80% rule) that was introduced by EEOC. I also asked lawyers from Poland and Europe so they could point me to some other directives or rules in this region, such law does not exist. But 80% rule is a good tangible boundary!

Changes in fairmodels

Fairness check

To adhere to that rule some changes needed to be made. In fairness_check() instead of differences between privileged and unprivileged subgroups ratios were introduced. An ideal fair classifier would have metric rates that comply with this criterion:

where i denotes subgroup and epsilon by default equals 0.8 which results in checking the four-fifths rule. This results in a different scale in fairness check.

Now we will look at the difference in the fairness_object plot.
As a reminder, I wrote this code snippet.

library('fairmodels')
library('DALEX')
library('ranger')
data("german")
y_numeric <- as.numeric(german$Risk) -1
lm_model <- glm(Risk~.,
                data = german,
                family=binomial(link="logit"))
rf_model <- ranger::ranger(Risk ~.,
                           data = german,
                           probability = TRUE,
                           max.depth = 3,
                           num.trees = 100,
                           seed = 1)
explainer_lm <- DALEX::explain(lm_model, data = german[,-1], y = y_numeric)
explainer_rf <- DALEX::explain(rf_model,
                               data = german[,-1],
                               y = y_numeric)
fobject <- fairness_check(explainer_lm, explainer_rf,
                          protected = german$Sex,
                          privileged = "male")
plot(fobject)

This idea also solved one problem. If metric scores were very little, the difference would be barely visible. With ratios, scores around 0.01 are no longer a problem!

Parity loss

Parity loss also needed to be changed. Now it is a little bit more complicated but its properties remain unchanged.

The main point of introducing parity loss was to be able to:

Aggregate metric scores among subgroups
Have a positive value where 0 indicates no bias
Comparing metric between each other

All of those criteria are met. This scary logarithm here is for 2 reasons. One is to convert the ideal ratio which is 1 to 0. Another is to have equal value when the ratio was inverted — then the logarithm would have the opposite sign, this is why there is a module of a logarithm. Pretty neat right?

This created another problem

When some ratio, let's say FPR for subgroup A is equal to 0, then when dividing it by anything (apart from 0) would result in 0. This is not a good property so, instead of that fairmodels returns NA . So how can we check if there is a problem with zeros? With metric_scores plot! It is really easy and intuitive. It just shows real metric scores for each subgroup and model! We can use fobject from earlier.

ms <- metric_scores(fobject)
plot(ms)

This wraps up v0.2.2 of fairmodels.

Summary and future

This was a hard change for me as instead of elegant differences I had to introduce ratios. But I know that it is a change for the better. At the moment I am developing a fairness module for dalex in Python. In the future in both R and Python, there will be also support for regression and individual fairness along with other more innovative types of measuring bias.

Be sure to check out other (a bit outdated) blogs if you didn’t.

If you have any problems with the package, leave an issue here

If you are interested in other posts about explainable, fair, and responsible ML, follow #ResponsibleML on Medium.

In order to see more R related content visit http://www.r-bloggers.com/

What is new in fairmodels? was originally published in ResponsibleML on Medium, where people are continuing the conversation by highlighting and responding to this story.

To leave a comment for the author, please follow the link and comment on their blog: R in ResponsibleML on Medium.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

R-bloggers

R news and tutorials contributed by hundreds of R bloggers