Skip to content

Data quality

Compliance Info

Below we map the engineering practice to articles of the AI Act, which benefit from following the practice.

Motivation

Art. 10(3) of the AI Act demands a certain quality of data used for training and evaluating models, in particular these data sets should be:

  • relevant,
  • sufficiently representative
  • complete, and
  • free of errors.

To achieve those qualities, there are different techniques available at different steps in the system lifecycle.

Implementation Notes

Note that the techniques discussed in the section focus on technical approaches for ensuring data quality. They need to be accompanied by organizational and governance measures to become fully effective.

Data Preprocessing

  • Detect and handle missing or incomplete data
    • Conduct data analysis to identify missing fields.
    • Use statistical methods to assess if missing data skews results.
    • Implement appropriate handling (e.g., interpolation, mean/mode imputation).
  • Perform data consistency checks
    • Enforce data schema for tabular data.
    • Identify and remove duplicate records.
    • Ensure data formats are consistent (e.g., all dates in correct format).
    • Check for missing values and determine handling strategies (imputation or removal).
  • Keep preprocessing consistent, versioned and reproducible

Data Quality Validation

  • Validate data against ground truth
    • Cross-check a sample of the dataset against verified real-world sources or domain experts.
  • Ensure data accuracy through automated validation
    • Logical inconsistencies (e.g., negative age values).
    • Outliers and anomalies using statistical methods (e.g., z-score, IQR analysis).
  • Produce automated data quality reports for human review and inclusion in technical documentation.
  • Monitor for data drift over time
    • Set up periodic validation checks to see if the data distribution changes over time.
    • Retrain models if significant drift is detected.

Key Technologies

Legal Disclaimer (click to toggle)

The information provided on this website is for informational purposes only and does not constitute legal advice. The tools, practices, and mappings presented here reflect our interpretation of the EU AI Act and are intended to support understanding and implementation of trustworthy AI principles. Following this guidance does not guarantee compliance with the EU AI Act or any other legal or regulatory framework. We are not affiliated with, nor do we endorse, any of the tools listed on this website.