Bias mitigation

Compliance Info

Below we map the engineering practice to articles of the AI Act, which benefit from following the practice.

Art. 10 (Data and Data Governance), in particular:
- Art. 10(2)(g): Appropriate measures for bias detection, prevention, and mitigation
- Art. 10(3): Assert appropriate statistical properties regarding natural persons related to the use of the high-risk AI system
- Art. 10(5): Use of special categories of personal data in bias detection and correction
Art. 15 (Accuracy, Robustness and Cybersecurity), in particular:
- Art. 15(4), bias mitigation increases resilience against bias-related errors and inconsistencies

Motivation¶

Biases are commonly considered one of the most detrimental effects of artificial intelligence (AI) use.

Art. 10 mandates the examination, detection, prevention, and mitigation of biases present in the data that could result in a harmful impact to health, safety, or fundamental rights.

As such, data governance activities should include practices to cover these requirements and map them to activities in the machine learning lifecycle.

Interdisciplinary Nature of Bias and Fairness

Bias and fairness are complex, multifaceted issues that encompass social, legal, and ethical dimensions that extend far beyond mere technical considerations.

While this document focuses on technical approaches for bias detection and mitigation, it is vital to recognize that technological solutions alone are insufficient to address these challenges comprehensively. Effective bias mitigation requires interdisciplinary collaboration involving domain experts, ethicists, legal professionals, and affected communities.

Legal notions of bias and fairness may vary across jurisdictions and contexts (such as access to employment, healthcare, or financial services). Understanding the context in which a system operates is crucial for ensuring compliance with relevant laws and regulations and identifying appropriate fairness measures and bias mitigation strategies.

The techniques outlined on this page should be considered as part of a broader strategy that includes organizational policies, diverse team composition, stakeholder engagement, and ongoing monitoring and evaluation.

Implementation Notes¶

Taxonomy of common causes of bias in machine learning. Source: Mitigating Bias in Machine Learning

Bias and Fairness Analysis Techniques¶

Conduct Exploratory Data Analysis (EDA): Analyze the dataset for imbalances or patterns that may suggest bias, such as over-representation or under-representation of certain groups.
Fairness Metrics: Calculate fairness metrics such as demographic parity/disparate impact, equal opportunity, or equalized odds to quantify bias in datasets and model outputs.
See fairlearn documentation for an introduction to commonly used metrics
Diversity Analysis: Evaluate the dataset's demographic diversity, ensuring it represents all relevant populations appropriately.
- Group Representation: Checks whether groups are proportionally represented in the dataset (calculate as fraction of the total dataset size)
- Overall Accuracy Equality: Ensures that accuracy rates are equal across groups.

Mitigation Techniques¶

Bias mitigation techniques broadly fall into three categories, based on their applicability during the machine learning lifecycle:

flowchart LR
    assessment[Bias Assessment]
    audit[Auditing]

    subgraph mitigation[Bias Mitigation]
        direction TB

        preproc[Preprocessing Techniques]
        inproc[Inprocessing Techniques / Model Training]
        postproc[Postprocessing Techniques]

        preproc --> inproc
        inproc --> postproc
    end

    assessment --> mitigation
    mitigation --> audit

Preprocessing Techniques¶

The goal of preprocessing techniques is to adjust the dataset before training the model to ensure a fair representation of all demographic groups in the training data.

Resampling: Use oversampling or undersampling to balance the representation of different demographic groups.
Synthetic data generation: Generate synthetic examples for under-represented groups to ensure better balance in the dataset.
Reweighing: Adjust the weights of data instances to ensure fair representation across groups.

Inprocessing Techniques / Model Training¶

Bias-Corrected Features: Transform features to reduce correlations with sensitive attributes (e.g., gender, race).
Fair Representations: Use fairness-aware models that explicitly optimize for fairness metrics alongside predictive accuracy.

Post-processing Techniques¶

Outcome Adjustments: Adjust decision thresholds or outputs to ensure equitable outcomes across demographic groups.
- Equalized Odds Postprocessing: Modifies predictions to satisfy equalized odds constraints.
Bias Mitigation Strategies: Apply fairness postprocessing methods, such as calibration by group or equalized odds adjustments.

Auditing¶

Regularly audit data to ensure it is free from systemic errors or biases.

Segmentation Analysis: Partition the dataset based on sensitive attributes and assess performance metrics for each segment to detect disparities.
Subgroup Fairness Checks: Compare outcomes for different demographic subgroups to identify discrepancies.
Drift Detection: Use tools to detect data or model drift that may reintroduce bias over time.

Privacy Concerns¶

Art. 10(5) makes an exception that allow the use of special categories of personal data for bias detection and correction.

When using this sensitive data, it is crucial to ensure that the data is handled in compliance with applicable data protection regulations, such as GDPR. The purpose for which the data is used and its scope must be clearly defined and documented. In particular, this includes deleting the sensitive data after the bias correction process is completed.

Keep an audit trail of the data processing activities, including the use of sensitive data for bias detection and correction. A workflow orchestration tool can support this requirement by providing a clear record of the data processing steps.

Key Technologies¶

AI Fairness 360 (aif360) by IBM
Fairlearn by Microsoft
What-if Tool by Google
imblearn Python package for scikit-learn, provides tools when dealing with classification with imbalanced classes.