登录查看更多内容

The EU AI Act: An Opportunity for better Data and Governance

Axel Schwanke

Senior Data Engineer | Data Architect | Data Science | Data Mesh | Data Governance | 4x Databricks certified | 2x AWS certified | 1x CDMP certified | Medium Writer | Turning Data into Business Growth | Nuremberg, Germany

发布日期: 2024年9月23日

+ 关注

How modern data platforms such as Databricks can contribute to compliance with Article 10 of the EU AI Act: Data and Data Governance.

The use of an appropriate data platform and governance solution is essential for the development of high-risk AI systems as it helps to ensure data quality, transparency and regulatory compliance.
The EU AI Act provides an excellent opportunity for organizations to adopt data engineering best practices to improve data quality, data management and data governance, thereby strengthening the compliance, reliability, and trustworthiness of AI systems.

This is an slightly updated version of the Medium article.

Introduction

As AI technology advances, the regulatory framework surrounding its development and deployment also evolves. The EU AI Act represents a comprehensive legislative effort that categorizes AI systems by risk levels and imposes strict requirements on high-risk systems.

Article 10 of the Act stipulates that high-risk AI systems must utilize high-quality, representative, and error-free datasets for training, validation, and testing. This involves effectively managing data to address biases, fill gaps, and ensure relevance to the specific context of use.

Databricks Lakehouse, combined with its integrated Unity Catalog, provides robust solutions to meet these requirements. It offers advanced data governance and management tools that facilitate compliance with data quality standards, enhance transparency, and ensure the secure handling of data, including sensitive personal data. This alignment with the EU AI Act’s stringent standards promotes responsible and compliant AI deployment.

The EU AI Act

The EU AI Act classifies AI systems into four categories: unacceptable, high risk, limited risk, and minimal risk. High-risk AI systems are subject to stringent regulations, placing primary responsibility on providers, while users have a secondary role. Additionally, providers of general-purpose AI must adhere to specific documentation and cybersecurity requirements to ensure compliance.

The AI Act classifies AI according to its risk:

Unacceptable risk is prohibited (e.g. social scoring systems and manipulative AI).
Most of the text addresses high-risk AI systems, which are regulated.
A smaller section handles limited risk AI systems, subject to lighter transparency obligations: developers and deployers must ensure that end-users are aware that they are interacting with AI (chatbots and deepfakes).
Minimal risk is unregulated (including the majority of AI applications currently available on the EU single market, such as AI enabled video games and spam filters — at least in 2021; this is changing with generative AI).

The majority of obligations fall on providers (developers) of high-risk AI systems.

Those that intend to place on the market or put into service high-risk AI systems in the EU, regardless of whether they are based in the EU or a third country.
And also third country providers where the high risk AI system’s output is used in the EU.

Users are natural or legal persons that deploy an AI system in a professional capacity, not affected end-users.

Users (deployers) of high-risk AI systems have some obligations, though less than providers (developers).
This applies to users located in the EU, and third country users where the AI system’s output is used in the EU.

? High-level summary of the AI Act This article provides a high-level summary of the AI Act, selecting the parts which are most likely to be relevant.

The excellent Udemy course “EU AI Act Compliance Introduction” offers a comprehensive overview of the EU AI Act and its implications for businesses. It addresses critical topics, including the urgency of AI regulation, prohibited practices, high-risk AI systems, and transparency obligations. With 77 lectures totaling over 10 hours, the course covers compliance responsibilities, AI deception, bias prevention, biometric surveillance, and future innovation under the Act. The course examines frameworks such as risk management, human oversight, and data governance, emphasizing practical implementation and the importance of adhering to regulatory standards.

? EU AI Act Compliance Introduction A Beginner’s Guide to Understanding and Achieving Compliance With European Union Artificial Intelligence Act

Importance of Data

Data readiness is essential for successful AI systems. High-quality, well-organized data prevents issues like incomplete information and inefficiencies. Organizations face challenges like poorly curated data, increasing volumes of transactional data, and the high costs associated with manual data preparation. These obstacles often lead to ineffective AI outputs and impede scalability.

? GenAI Data: Is Your Data Ready for Generative AI? Data readiness is the ability to prove the fitness of data for generative AI use cases.

Gartner predicts that 30% of generative AI projects will be abandoned after the proof of concept by the end of 2025, primarily due to poor data quality. This challenge often prevents organizations from demonstrating clear business value and effectively deploying GenAI, despite substantial investments.

To enhance data readiness, organizations should focus on integrating and securing data across the enterprise, dismantling information silos, and automating data quality processes. Technologies like Retrieval Augmented Generation (RAG) can significantly improve AI performance by providing accurate, context-specific responses through real-time access to reliable internal data, thereby facilitating effective digital transformation and enhancing customer experiences.

Data Governance

We have seen that data is a vital asset for organizations, but its true value is unlocked through effective data governance. This encompasses the principles, practices, and tools necessary to manage the entire data lifecycle while aligning data management with business strategy. A robust governance strategy enhances data management, visibility, and auditing capabilities, ensuring regulatory compliance and safeguarding against unauthorized access. By establishing clear governance frameworks, organizations can improve data quality and integrity, ultimately driving better decision-making and operational efficiency.

Data Governance Challenges

Fragmented Data Landscape: Data silos across various sources hinder efficient data discovery and increase operational costs.
Complex Access Management: Inconsistent access management tools create security issues and complicate collaboration.
Inadequate Monitoring: Lack of comprehensive monitoring impairs audits, impact analyses, and error diagnosis, affecting data quality.
Limited Sharing and Collaboration: No standardized sharing solutions result in data redundancy and inefficiencies.

A unified data, analytics, and AI platform, such as the Databricks Lakehouse Platform — integrated with the Unity Catalog governance framework — can effectively tackle these challenges by streamlining data governance initiatives and enhancing data management and collaboration. This comprehensive approach ensures that organizations can manage their data assets more efficiently while promoting transparency, compliance, and strategic alignment across teams.

Recommendations:

Integrate Data Sources: Consolidate data from various sources like lakes, warehouses, and clouds into one system to reduce inefficiencies and improve data recognition.
Standardize Access Management: mplement uniform access management across platforms to boost security and streamline audits.
Enhance Monitoring and Visibility: Use comprehensive monitoring tools to track data and AI asset lifecycles, improving audits and data quality.
Adopt a Unified Governance Platform: Employ a single governance platform to standardize and automate processes, managing data and AI assets efficiently and securely.

? Data and AI Governance Recommendations from Databricks’ eBook “A Comprehensive Guide to Data and AI Governance”

EU AI Act & Data Governance

Data and data governance are essential for the development and operation of AI systems, ensuring the quality, accuracy, and reliability of data used for training, testing, and validation. Effective data governance plays a crucial role in identifying and mitigating bias, promoting fairness and ethical operation of AI systems. Adherence to legal frameworks such as the EU AI Act and the General Data Protection Regulation (GDPR) is vital for protecting individual privacy and meeting legal standards, thereby enhancing public trust in AI technologies.

? AI Act Explained: Navigating Data Governance in the Age of Artificial Intelligence Dive into the AI Act with a comprehensive overview, designed to guide you through the complexities of data governance…

Article 10 of the EU AI Act focuses on ensuring high-risk AI systems use high-quality data for training, validation, and testing. Data must be carefully managed to address collection processes, biases, and gaps, and must be relevant and error-free. Special categories of personal data may be used to correct biases, with strict safeguards to protect individual rights.

Article 17 of the EU AI Act requires providers of high-risk AI systems to establish a documented quality management system. This includes strategies for compliance, design, testing, data management, risk handling, post-market monitoring, and record-keeping, adjusted for the provider’s size and existing obligations.

Now, let’s turn our attention to data and data governance as outlined in Article 10 of the EU AI Act. We will examine how these requirements can be transformed into specific best practices for effective compliance and management in organizations.

Article 10: Data and Data Governance

In this section, we will explore the requirements outlined in Article 10 for high-risk AI systems, focusing on their implications for data platforms and governance frameworks. We will provide practical applications of these requirements using Databricks Lakehouse and Unity Catalog as examples.

Abbreviations used:

‘ → DG’ represents the requirements for a data platform and governance framework — referred to as: DG platform.

‘ → EX’ illustrates how Databricks Lakehouse and Unity Catalog can effectively meet these requirements — simply referred to as: Databricks.

Part of Chapter III: High-Risk AI System ? Section 2: Requirements for High-Risk AI Systems ? Article 10: Data and Data Governance

Article 10

This article states that high-risk AI systems must be developed using high-quality data sets for training, validation, and testing. These data sets should be managed properly, considering factors like data collection processes, data preparation, potential biases, and data gaps. The data sets should be relevant, representative, error-free, and complete as much as possible. They should also consider the specific context in which the AI system will be used. In some cases, providers may process special categories of personal data to detect and correct biases, but they must follow strict conditions to protect individuals’ rights and freedoms.

1. High-risk AI systems which make use of techniques involving the training of AI models with data shall be developed on the basis of training, validation and testing data sets that meet the quality criteria referred to in paragraphs 2 to 5 whenever such data sets are used.

2. Training, validation and testing data sets shall be subject to data governance and management practices appropriate for the intended purpose of the high-risk AI system. Those practices shall concern in particular ...

→ DG: Utilize a platform that grants AI systems access to high-quality, relevant, and error-free data. This entails comprehensive management of data collection and preparation, proactively addressing data biases and gaps, and ensuring adherence to stringent privacy and protection standards.

→ EX: Databricks fulfills these requirements by offering a unified platform that ensures effective management of high-quality data through robust governance practices. It facilitates seamless integration of data from diverse sources, enforces rigorous data quality checks, and supports comprehensive data lineage tracking. Together with Unity Catalog, Databricks enables compliance with data privacy and security standards, ensuring data remains relevant, accurate, and secure.

Unified Databricks governance architecture

Forage AI 9 个月前

Instabase and NatWest Unlock Unstructured Data

Instabase 6 个月前

20 Data Trends for 2020

Kate Strachnyi 4 年前

Article 10 ...

a) the relevant design choices; b) data collection processes and the origin of data, and in the case of personal data, the original purpose of the data collection; c) relevant data-preparation processing operations, such as annotation, labelling, cleaning, updating, enrichment and aggregation; …

→ DG: Choose a platform that accommodates various data system architectures while providing a robust governance framework, which is crucial for effectively managing high-risk AI systems. It should support all relevant data processing operations — such as annotation, labeling, cleaning, updating, enrichment, and aggregation — both in batch and real-time.

→ EX: Databricks enables the implementation of diverse architectures, including Lakehouse and Data Mesh, through its scalable data management and governance tools. This facilitates efficient data integration and processing across a range of cloud-based environments. Unity Catalog enhances these capabilities by offering fine-grained governance and secure, auditable data access, supporting both centralized and distributed governance models with environment-specific access controls.

Article 10 ...

f) examination in view of possible biases that are likely to affect the health and safety of persons, have a negative impact on fundamental rights or lead to discrimination prohibited under Union law, especially where data outputs influence inputs for future operations; g) appropriate measures to detect, prevent and mitigate possible biases identified according to point (f);

→ DG: Use a DG platform that supports the detection and mitigation of potential biases in datasets and AI outputs.

→ EX: Databricks provides robust bias detection capabilities through tools specifically designed to analyze datasets and AI results. This functionality helps identify and address biases, ensuring fairness and equity in AI systems.

Using SHAP to Vizualize Features that Interact with Gender

Article 10 ...

h) the identification of relevant data gaps or shortcomings that prevent compliance with this Regulation, and how those gaps and shortcomings can be addressed.

→ DG: Choose a platform that allows for the identification of data gaps and other quality deficiencies while facilitating the effective implementation of strategies to address these issues.

→ EX: Databricks offers advanced data quality and management features that enable comprehensive cataloging, monitoring, and evaluation of data assets. These capabilities include identifying and resolving quality deficiencies and data gaps, ensuring effective solutions are in place to maintain data integrity.

Article 10 ...

3. Training, validation and testing data sets shall be relevant, sufficiently representative, and to the best extent possible, free of errors and complete in view of the intended purpose. They shall have the appropriate statistical properties, including, where applicable, as regards the persons or groups of persons in relation to whom the high-risk AI system is intended to be used. Those characteristics of the data sets may be met at the level of individual data sets or at the level of a combination thereof.

→ DG: Select a platform that enables the creation of comprehensive and error-free training, validation, and test datasets, ensuring that the data exhibits the appropriate statistical properties to accurately represent the intended use of high-risk AI systems.

→ EX: Databricks offers robust data management and governance features, supporting quality assurance tools that guarantee datasets are complete, error-free, and meet the necessary statistical properties. This ensures that the data is suitable for the intended applications of high-risk AI systems.

Article 10 ...

5. To the extent that it is strictly necessary for the purpose of ensuring bias detection and correction in relation to the high-risk AI systems in accordance with paragraph (2), points (f) and (g) of this Article, the providers of such systems may exceptionally process special categories of personal data, subject to appropriate safeguards for the fundamental rights and freedoms of natural persons. In addition to the provisions set out in Regulations (EU) 2016/679 and (EU) 2018/1725 and Directive (EU) 2016/680, all the following conditions must be met in order for such processing to occur:

a) the bias detection and correction cannot be effectively fulfilled by processing other data, including synthetic or anonymised data;

b) the special categories of personal data are subject to technical limitations on the re-use of the personal data, and state-of-the-art security and privacy-preserving measures, including pseudonymisation;

c) the special categories of personal data are subject to measures to ensure that the personal data processed are secured, protected, subject to suitable safeguards, including strict controls and documentation of the access, to avoid misuse and ensure that only authorised persons have access to those personal data with appropriate confidentiality obligations; …

→ DG: To comply with the conditions for the exceptional use of personal data, select a DG platform that facilitates the secure processing of personal data for bias detection. This platform should be integrated across the organization and accessible to all data sources. It must enforce stringent safeguards, such as pseudonymization and advanced access controls, while ensuring that data is not shared externally or retained longer than necessary.

→ EX: Databricks enables secure and controlled data processing through its advanced data governance capabilities. Unity Catalog ensures compliance with data protection regulations by implementing rigorous access controls, comprehensive auditing and documentation, and robust data protection mechanisms. This framework upholds the integrity and confidentiality of data throughout its lifecycle.

Security, compliance, and privacy for the data lakehouse

The EU AI Act as an Opportunity …

The EU AI Act presents not only a regulatory challenge but also a significant opportunity for organizations to improve their data processing and governance practices. By setting strict standards for high-risk AI systems — including comprehensive data quality, governance, and management requirements — the AI Act serves as a compelling incentive to implement robust data processing strategies.

Lakehouse Center of Excellence: 4 Key Tenets of a Successful Data and AI Business

First, the AI Act’s emphasis on high-quality datasets encourages the adoption of advanced data engineering techniques such as data lineage tracking, versioning, and validation. These practices ensure that data is accurate, traceable, and reliable. Additionally, the need to address potential biases and data gaps drives organizations to refine their data collection, processing, and monitoring processes.

Second, the Act’s focus on data governance aligns with best practices in data management. It necessitates the implementation of frameworks that enhance data security, transparency, and compliance, including robust access controls, comprehensive data documentation, and adherence to data privacy regulations.

Ultimately, the Act serves as a catalyst for modernizing data engineering practices, promoting higher standards of data integrity, and fostering a culture of responsible and ethical AI development.

To effectively meet these requirements, organizations should adopt best practices from industry leaders. Below are key data and governance eBooks, guides and best practices from Databricks:

Conclusion

In navigating the complexities of the EU AI Act — especially the stringent data requirements outlined in Article 10 — tools such as Databricks Lakehouse and Unity Catalog are proving to be essential for ensuring compliance. Their integrated data governance frameworks effectively meet the AI Act’s demands for high-quality, representative, and unbiased data, featuring advanced capabilities such as data lineage tracking, fine-grained access controls, and comprehensive data management.

By using such integrated data and governance platforms, organizations can efficiently manage their data assets, mitigate risk, and demonstrate compliance with the strict standards of The EU AI Act. Applying best practices to data processing not only ensures legal compliance, but also promotes trust and transparency in AI systems, paving the way for responsible and ethical AI development.

... so don't just see the EU AI Act as a burden, but use this opportunity to modernize your data infrastructure, increase operational efficiency, and lay the foundation for future-proof, trusted AI solutions.

For a deeper exploration of these topics, join me at the AI Navigator 2024 Conference, where I’ll be sharing comprehensive insights and actionable strategies on data governance best practices: “Managing Compliance: Governance Strategies under the new EU AI Act”

The EU AI Act: An Opportunity for better Data and Governance

Axel Schwanke

Senior Data Engineer | Data Architect | Data Science | Data Mesh | Data Governance | 4x Databricks certified | 2x AWS certified | 1x CDMP certified | Medium Writer | Turning Data into Business Growth | Nuremberg, Germany

How modern data platforms such as Databricks can contribute to compliance with Article 10 of the EU AI Act: Data and Data Governance.

Introduction

The EU AI Act

Importance of Data

Data Governance

EU AI Act & Data Governance

Article 10: Data and Data Governance

领英推荐

The EU AI Act as an Opportunity …

Conclusion

Further Reading

Selected Data Engineering Post

2,038 位关注者

更多精彩文章

社区洞察

其他会员也浏览了

Quality Data, Powerful AI: Laying the Groundwork for Intelligent Solutions

BigID's Data Leader Series: Week 6 - How to Scale and Sustain AI in Industry

Data Governance for AI: Keeping Your Algorithms on the Right Track

Synthetic Data Generation: Unlocking the Potential of Artificial Data

Is Data Quality the CIO's AI Dilemma?

Before the AI Leap: Why a Solid Data Strategy is Your Safety Net

Data and AI Governance: Evolving Traditional Data Governance in the Age of Artificial Intelligence

Unveiling the Dark Side: Managing Dark Data for Responsible AI

Solving the Dark Data Problem

Unlocking Innovation with Synthetic Data: A Solution for Data-Driven Organizations

How modern data platforms such as Databricks can contribute to compliance with Article 10 of the EU AI Act: Data and Data Governance.

Introduction

The EU AI Act

Importance of Data

Data Governance

EU AI Act & Data Governance

Article 10: Data and Data Governance

领英推荐

The EU AI Act as an Opportunity …

Conclusion

Further Reading

Selected Data Engineering Post

2,038 位关注者

The EU AI Act: A Catalyst for Sustainable Data and AI

2024年11月9日

Selected Data Engineering Posts . . . October 2024

2024年10月30日

The EU AI Act ... A Must-Know for Future AI Professionals

2024年10月13日

Selected Data Engineering Posts . . . September 2024

2024年9月29日

Selected Data Engineering Posts . . . August 2024

2024年8月29日

Unleashing Innovation: Brainwriting for Data Engineers — but not just for them ...

2024年8月16日

Selected Data Engineering Posts . . . July 2024

2024年7月30日

Semantic Layer — One Layer to Serve Them All

2024年7月13日

Selected Data Engineering Posts . . . June 2024

2024年6月29日

What else a Data Engineer should know: Lead Management

2024年6月18日

社区洞察

其他会员也浏览了

Quality Data, Powerful AI: Laying the Groundwork for Intelligent Solutions

BigID's Data Leader Series: Week 6 - How to Scale and Sustain AI in Industry

Data Governance for AI: Keeping Your Algorithms on the Right Track

Synthetic Data Generation: Unlocking the Potential of Artificial Data

Is Data Quality the CIO's AI Dilemma?

Before the AI Leap: Why a Solid Data Strategy is Your Safety Net

Data and AI Governance: Evolving Traditional Data Governance in the Age of Artificial Intelligence

Unveiling the Dark Side: Managing Dark Data for Responsible AI

Solving the Dark Data Problem

Unlocking Innovation with Synthetic Data: A Solution for Data-Driven Organizations