Bias Detection and Mitigation¶
Note
While the dataset used in the showcase system contains multiple protected characteristics in the context of employment access (e.g., race, sex, age, disability status, ...), the implementation focuses on detecting and mitigating bias related to sex for demonstration purposes.
Setting appropriate thresholds and metrics for detection of unacceptable bias is a complex task that requires careful consideration of the specific context and the potential impact of the model's predictions.
Bias Detection¶
Two main types of bias are considered in the showcase system:
- bias inherent to the dataset, which refers to the potential biases present in the training data that can lead to unfair predictions;
- bias stemming from the ML model, which refers to the potential biases introduced by the model itself during training or inference.
Various metrics are calculated to assess the fairness of the model's predictions. These metrics are logged to the MLflow experiment log during training, in order to be able to assess the model's fairness across different versions.
Two Python packages are used to calculate the relevant metrics:
- Fairlearn: a toolkit for assessing and mitigating unfairness in machine learning models.
- AIF360: a comprehensive toolkit for detecting and mitigating bias in AI systems.
Bias Mitigation¶
As an example for a bias mitigation technique, the showcase system implements a correlation removal pre-processing step. This pre-processing transformation is implemented using the Fairlearn library.
In line with the risk assessment for the showcase, the SEX
feature is identified as a sensitive attribute that should be considered for bias mitigation.
The mitigation is applied to the training data before training the model, and the model is then trained on the mitigated dataset. This helps to ensure that the model does not learn biased patterns from the training data, leading to fairer predictions, at the expense of slightly worse model accuracy.
The following plots compare the feature importance of the SEX
feature for models that have been trained on the original dataset and on the mitigated dataset (see the page on explainability for more details on the SHAP approach).
The plots on the left show the feature importance for the unmitigated dataset, while the plots on the right show the feature importance for the mitigated dataset.




We can see that the mitigation reduced the importance of the SEX
feature, which can be seen in the smaller range of the SHAP values in the violin plot for this feature.
The bar plot shows that the overall importance of the SEX
feature is also reduced so far that it is no longer among the top 10 most important features in the model.