Skip to content

Data documentation

Compliance Info

Below we map the engineering practice to articles of the AI Act, which benefit from following the practice.

  • Art. 10 (Data and Data Governance), in particular:
    • Art. 10(2), clear and structured documentation of data sets supports other data governance practices
  • Art. 11(1) in conjunction with Annex IV (Technical Documentation), in particular:
    • Annex IV(2)(d), datasheets for training data sets are explicitly mentioned
    • Annex IV(2)(g), validation and test data sets should be documented and characterized
  • Art. 13 (Transparency and Provision of Information to Deployers), in particular:
    • Art. 13(3)(b)(vi), training, validation, and test data sets should be appropriately documented

Rationale

As part of the overall data governance strategy, data documentation is a key practice to ensure that data sets are well understood and properly managed.

This includes documenting the purpose of the data, its sources, curation method, its structure, and any transformations or processing that have been applied to it. Consideration of limitations, potential biases, and ethical implications of the data is also important.

Implementation Notes

Most data documentation methodologies (see below) provide predefined templates or guidelines for documenting data sets, in order to ensure consistent application of these practices.

Data documentation should be treated as a living artifact, and as such should be versioned appropriately (e.g., using a version control system like Git). Plain text formats like Markdown are well-suited for this purpose, as they are easy to read and edit, and can be easily converted to other formats (e.g., HTML or PDF) for publication.

Key Technologies

Several approaches and formats exist for documenting data sets, including:

While these approaches differ in their concrete structure and content, they all aim to provide a comprehensive overview of the data set, including its purpose, sources, curation methods, and any limitations or ethical considerations, in line with the requirements for technical documentation in Annex IV.

Resources

Legal Disclaimer (click to toggle)

The information provided on this website is for informational purposes only and does not constitute legal advice. The tools, practices, and mappings presented here reflect our interpretation of the EU AI Act and are intended to support understanding and implementation of trustworthy AI principles. Following this guidance does not guarantee compliance with the EU AI Act or any other legal or regulatory framework. We are not affiliated with, nor do we endorse, any of the tools listed on this website.