登录查看更多内容

Good data means Good AI

Ankur Mitra

Quality, Regulations, Technology - Connecting the Dots - And a Lot of Questions

发布日期: 2024年9月5日

Introduction

The adage "AI + Bad Data = Bad AI" sums up this entire article. An AI is as good as the data. The success of any AI system depends heavily on the quality and suitability of the data it processes. In highly regulated industries like life sciences and healthcare, using the right data is not only a best practice but a critical requirement. Adhering to industry standards and regulatory guidelines ensures that data used in AI systems is accurate, reliable, and compliant with legal obligations. This article will explore why selecting the right data is crucial, what constitutes the right data, and how to ensure data quality, with practical steps followed by relevant regulatory insights.

Why Is the Choice of the Right Data Important for Any AI System?

Data serves as the foundation for AI systems, influencing their reliability, accuracy, and compliance. If the data fed into an AI system is flawed, incomplete, or biased, the system’s outputs will reflect these issues, leading to poor decision-making, non-compliance, and potential harm.

For instance, if an AI system uses incorrect data, in pharmaceutical manufacturing, it could result in products that do not meet safety standards, putting patients at risk. Regulatory bodies such as the FDA and EMA mandate strict data controls to avoid these risks. Beyond compliance, the right data is essential for maintaining organizational reputation, trust, and operational efficiency.

What Is the Right Data for an AI System?

The right data for an AI system is accurate, relevant, complete, and structured appropriately for the system’s objectives. This data must be free from bias, secure, and aligned with industry-specific standards. The health authorities expect right data to be attributable, legible, contemporaneous, original, accurate, complete, consistent, enduring, and available. NIST expects right data to have accuracy, completeness, consistency, timeliness, accessibility, relevance, reliability, integrity, validity, and uniqueness. Both expect data to be looked in context. For example, in predictive maintenance, the right data would include detailed historical failure records rather than general operational data. The volume and scalability of the data also play a critical role, as AI models often require large datasets to perform effectively.

How to Ensure Data Is Accurate, Reliable, Consistent, Integral, and of High Quality for Any AI System

Building a robust AI system starts with a commitment to high-quality data. The success of an AI model hinges not just on sophisticated algorithms but on the integrity, reliability, and quality of the data it processes. Poor-quality data can lead to flawed insights, biased decisions, and significant compliance risks, particularly in regulated industries.

Ensuring that the data fed into an AI system is accurate, reliable, consistent, integral, and of high quality involves a multifaceted approach that spans data governance, security, continuous monitoring, and ethical considerations. By following industry best practices and adhering to regulatory standards, organizations can mitigate risks and enhance the effectiveness of their AI initiatives. Below, I outline the essential steps to achieve this, along with the corresponding regulatory guidelines that ensure compliance and data excellence.

Implement Strong Data Integrity and Governance Practices

Ensuring data integrity and governance prevents unauthorized alterations and maintains data accuracy over time. 21 CFR Part 11 and EU Annex 11 regulations require that electronic records and signatures are trustworthy, reliable, and equivalent to paper records. ISO 9001 emphasizes the need for data that supports continuous improvement in quality management processes. IEC 62304 mandates rigorous data management for software used in medical devices to ensure integrity throughout the software lifecycle.

Define clear data governance policies, including roles and responsibilities for data management.
Establish and execute procedures to ensure data is accurate, complete, and protected from unauthorized changes.
Implement audit trails to track data modifications and access.

Prioritize Data Security and Privacy

Protecting data from breaches and ensuring compliance with privacy regulations is essential for maintaining trust and avoiding legal repercussions. ISO/IEC 27001 provides a framework for managing information security risks. GDPR mandates stringent controls over the processing of personal data, ensuring individuals' privacy rights. HIPAA establishes standards for protecting sensitive patient information in the healthcare sector.

Implement robust data encryption and access control mechanisms to protect data from unauthorized access.
Ensure compliance with data privacy laws by managing sensitive information securely and transparently.
Regularly audit data security measures to identify and address potential vulnerabilities.

Continuously Monitor Data Quality and Establish Feedback Loops

Continuous monitoring ensures that data quality remains high, preventing degradation over time and keeping AI models effective. NIST AI Risk Management Framework (RMF) recommends continuous assessment and feedback mechanisms to ensure data quality and model performance. ISO 27001 ensures that data security is maintained during continuous monitoring. IEC 61508 addresses the functional safety of electronic systems, emphasizing the need for ongoing data quality checks.

Set up real-time monitoring tools to continuously assess data quality, identifying errors, inconsistencies, or anomalies as they occur.
Implement feedback loops to update and refine AI models based on new data and insights.
Regularly review the performance of AI systems and adjust data inputs to maintain accuracy and relevance.

Cleanse and Normalize Data Before Use

Clean, consistent data is essential for producing accurate and reliable AI outputs. NIST Data Quality Guidelines stress the importance of eliminating inconsistencies and errors before data is used in AI. IEEE 1012 provides verification and validation (V&V) standards to ensure data meets predefined quality criteria.

Conduct thorough data cleansing to remove errors, duplicates, and irrelevant information from datasets.
Normalize data to ensure consistency in formatting, units of measurement, and categorization across datasets.
Validate cleansed and normalized data to ensure it meets quality standards before feeding it into AI systems.

Dana Gardner 5 年前

CPO Crunch: When it comes to AI, the key to starting…

Procurement Leaders | A World 50 Group Community 7 个月前

Optimizing RAG for AI: The Key to Better Retrieval and…

Mark A. Johnston 1 个月前

Ensure Data Provenance and Traceability

Understanding where data comes from and how it has been processed is crucial for maintaining trust and accountability. ICH Q7 and PIC/S Guidelines require detailed documentation to ensure data lineage and traceability. ISO 17025 mandates that testing and calibration laboratories verify and trace their data sources to ensure reliability.

Document the origin of data, including its source, collection methods, and any transformations it undergoes.
Maintain records that allow data to be traced back to its original source, ensuring transparency and accountability.
Implement version control to track changes in data over time, ensuring that all modifications are documented.

Adhere to Ethical Data Use and AI Governance Principles

Ethical data use is crucial for maintaining public trust, avoiding bias, and ensuring fair decision-making by AI systems. NIST AI RMF provides guidelines for developing AI systems that align with ethical standards. IEEE Standards for AI Ethics emphasizes transparency, accountability, and fairness in AI data practices. 21 CFR Part 11 and EU Annex 11 reinforce the importance of adhering to ethical principles in GxP-regulated environments.

Establish clear ethical guidelines for the use of data in AI, focusing on fairness, transparency, and accountability.
Implement governance frameworks that monitor and enforce these ethical guidelines, particularly in decision-making processes.
Regularly assess AI systems for potential biases and adjust data inputs or models to mitigate any identified issues.

Regularly Review and Audit Data

Regular reviews and audits help sustain long-term data quality, ensuring ongoing compliance and reliability. NIST Guidelines advocate for continuous assessment and periodic review of data quality.

Schedule periodic data reviews to ensure that datasets remain accurate, complete, and relevant over time.
Conduct regular audits to verify that data management practices align with regulatory requirements and internal policies.
Document audit results and take corrective actions where necessary to maintain data quality and integrity.

Conclusion

Ensuring that data used in AI systems is accurate, reliable, consistent, integral, and of high quality requires a comprehensive approach. By implementing strong data governance, prioritizing security and privacy, continuously monitoring data quality, and adhering to ethical principles, organizations can build AI systems that are not only effective but also compliant with regulatory requirements. By following the steps outlined above, supported by industry standards and regulatory guidelines, organizations can confidently navigate the complexities of data management in AI, ensuring that their systems deliver accurate, trustworthy, and valuable outcomes.

References

21 CFR Part 11, EU Annex 11

ISO 9001:2015

ISO/IEC 27001:2013, 17025:2017

IEC 62304:2006, 61508:2010

ICH Q7, PIC/S guidelines

GDPR, HIPAA

NIST RMF

IEEE 1012-2016, Data Annotation and AI Ethics

Disclaimer: The article is the author's point of view on the subject based on his understanding and interpretation of the regulations and their application. Do note that AI has been leveraged for the article's first draft to build an initial story covering the points provided by the author. Post that, the author has reviewed, updated, and appended to ensure accuracy and completeness to the best of his ability. Please use this after reviewing it for the intended purpose. It is free for use by anyone till the author is credited for the piece of work.

Christian Schmitz-Moormann

Relentlessly building tomorrow with great people, robust processes and cutting-edge validated systems for strictly regulated industries.

2 个月

Hi Ankur, great overview, thanks for putting it out. I would like to point to two topics which are probably complex, but if considered well will support your outlined approach. One is time: In areas like pharmacovigilance data has been collected and curated carefully for years, but over time the standards have changed and we still see changes. Data which was considered good a few years ago, may not meet today's quality standards. We need to be able to handle this aspect, as also the old data may have high value and should be usable. Two is documents: A lot of valuable information ist stored in documents and we see approaches how to make this information accessible and usable as data. Most of the approaches treat documents as monolithic entities without inherent structure. And, frankly, most documents do have an inherent structure. Losing this structural information leads to lower-than-possible quality of data.

2 次回应

Anvesh Jupaka

2 个月

Great analysis. Thanks for sharing.

1 次回应

查看更多评论

要查看或添加评论，请登录

查看全部

Good data means Good AI

Ankur Mitra

Quality, Regulations, Technology - Connecting the Dots - And a Lot of Questions

Introduction

Why Is the Choice of the Right Data Important for Any AI System?

What Is the Right Data for an AI System?

How to Ensure Data Is Accurate, Reliable, Consistent, Integral, and of High Quality for Any AI System

Implement Strong Data Integrity and Governance Practices

Prioritize Data Security and Privacy

Continuously Monitor Data Quality and Establish Feedback Loops

Cleanse and Normalize Data Before Use

领英推荐

Ensure Data Provenance and Traceability

Adhere to Ethical Data Use and AI Governance Principles

Regularly Review and Audit Data

Conclusion

References

更多精彩文章

社区洞察

其他会员也浏览了

Understanding the Role of AI in Government Decision Making

The Evolving Human-Machine Interface: Balancing Utility and Complexity in the Age of AI

Leveraging Artificial Intelligence: The Critical Role of Data Quality for Tangible Outcomes

The Advancement of AI and Data in Business

AI's Hidden Weakness: The Battle for High-Quality Data

Enhancing Explainable AI: The Critical Role of Data Quality and Human Intervention

Passing in the Labs, failing outside isn't what you want for your AI models

Expert Data: Bedrock’s Path to Leading the Race for AI Dominance

Data is not information, information is not knowledge...

Decoding Truth in AI: About Data, Decisions, and Determinism

Introduction

Why Is the Choice of the Right Data Important for Any AI System?

What Is the Right Data for an AI System?

How to Ensure Data Is Accurate, Reliable, Consistent, Integral, and of High Quality for Any AI System

Implement Strong Data Integrity and Governance Practices

Prioritize Data Security and Privacy

Continuously Monitor Data Quality and Establish Feedback Loops

Cleanse and Normalize Data Before Use

领英推荐

Ensure Data Provenance and Traceability

Adhere to Ethical Data Use and AI Governance Principles

Regularly Review and Audit Data

Conclusion

References

Artificial Intelligence (AI) and Quality Risk Management - Maintaining state of validation

2024年11月10日

Creating Intelligent Systems for Predictive Compliance - A Regulatory Compliance Perspective

2024年10月25日

Bringing Digital Twins and AI Together for GxP-Regulated Industries

2024年10月12日

Unlocking the Power of Digital Twins: Predictive Maintenance, Continuous Process Verification, and Supply Chain Resilience in GxP Environments

2024年10月5日

Digital Twins in GxP-Regulated Industries

2024年9月28日

Current Regulatory Thinking on AI in GxP-Regulated Ecosystems

2024年9月14日

Navigating the AI Lifecycle in GxP - Regulated Industries: Building on the US FDA thinking

2024年9月9日

Post-Production Management of GxP-regulated Cloud Ecosystem

2024年8月16日

Cloud Computing in a GxP-regulated ecosystem

2024年8月9日

AI Ethical Considerations for Pharmaceuticals and Medical Devices

2024年8月4日

社区洞察

其他会员也浏览了

Understanding the Role of AI in Government Decision Making

The Evolving Human-Machine Interface: Balancing Utility and Complexity in the Age of AI

Leveraging Artificial Intelligence: The Critical Role of Data Quality for Tangible Outcomes

The Advancement of AI and Data in Business

AI's Hidden Weakness: The Battle for High-Quality Data

Enhancing Explainable AI: The Critical Role of Data Quality and Human Intervention

Passing in the Labs, failing outside isn't what you want for your AI models

Expert Data: Bedrock’s Path to Leading the Race for AI Dominance

Data is not information, information is not knowledge...

Decoding Truth in AI: About Data, Decisions, and Determinism