How IBM is building responsible AI with a data governance-first approach

How IBM is building responsible AI with a data governance-first approach

Building trustworthy AI depends on trustworthy data. And this starts with data provenance, a method of bringing transparency to the origin of datasets used for both traditional data and AI applications. Yet the ecosystem needed a common language to provide that transparency.

This is why IBM , as part of the Data & Trust Alliance , helped co-create the first cross-industry data provenance standards. In early 2024, IBM tested these standards as part of their clearance process for datasets used to train foundational models. As a result, they saw increases in both efficiency (time for clearance) and overall data quality.

And with these positive results, IBM is now working to align Business Data Standards with the D&TA Data Provenance Standards where appropriate to further optimize enterprise data governance.

Read the IBM case study

Data provenance protects the integrity and reliability of data within an organization by meticulously documenting the history of data, its transformations and journey through various processes. This historical context helps with regulatory compliance, as it safeguards the accuracy and legitimacy of data, assuring that organizations meet legal and industry standards. Also, data provenance enhances transparency and accountability in data handling, a crucial aspect of cybersecurity.

"It would be a game changer if organizations could agree on a consistent [data provenance] methodology and framework to use end-to-end across the data ecosystem."

- Lee Cox , VP Integrated Governance and Market Readiness, IBM Office of Privacy and Responsible Technology

The data provenance standards are essential for documenting the origin and lifecycle of data, providing transparency that drives accuracy, reliability, and trustworthiness across all industries.

Co-developed by 19 member organizations at the Data & Trust Alliance (D&TA), these standards establish a uniform approach to increasing transparency in datasets, thereby enhancing the integrity and trust in data and the AI it feeds. The D&TA, comprising chief technology officers, chief data officers, and leaders in data acquisition, data governance, data strategy, data quality, legal, and compliance, aimed to create a robust and relevant set of standards.

“These practical standards, co-created by senior practitioners across industry, are designed to help evaluate whether AI workflows align with ever-changing regulations while also helping generate increased business value.”

- Rob Thomas , SVP Software and CCO at IBM

The group tested and validated the standards through diverse scenarios, including traditional data acquisition, synthetic data tracking in ADTECH , and governance for large language models. This cross-industry approach, encompassing companies of all sizes, ensured the standards could meet a broad range of stakeholder needs.

Furthermore, industry experts and governance groups validated the standards, further solidifying their applicability and effectiveness.

The goal of data governance is to maintain safe, high-quality data that is easily accessible for data discovery and business intelligence initiatives. Acting rather like an air traffic control hub, the data governance function helps ensure that verified data flows through secured pipelines to trusted endpoints and users. Artificial intelligence (AI),?big data?and?digital transformation?efforts are the primary drivers of data governance programs. As the volume of data increases from new data sources, such as?Internet of Things (IoT)?technologies, organizations need to reconsider their data management practices to scale their?business intelligence (BI)?efforts.

Data governance programs can help organizations protect and manage large amounts of data by improving?data quality, reducing data silos, enforcing compliance and security policies and distributing data access appropriately.

Implementing a strong data governance framework can help organizations realize a wide variety of benefits:

Get more value from enterprise data: Data governance can help ensure data integrity, accuracy, completeness and consistency through the creation of a framework that?supports robust data stewardship a strong end-to-end data management process. Trustworthy data helps organizations discover new opportunities, better understand their customers and workflows and optimize overall business performance.

Promote innovation and efficiency: When data access is restricted across an organization, it can limit innovation, create dependencies on subject matter experts (SMEs) and slow business processes. Data governance programs distribute data access appropriately, giving each department or individual access only to the data they need. This enables cross-functional teams to work together more closely and efficiently while keeping data safe.?

Provide a single source of truth (SSOT): A properly governed data system can provide a single source of truth across an entire organization. Decision-making can be improved when all parties are working with the same data sets.

Help ensure data privacy, security and compliance: Data governance tools help organizations set guardrails that can prevent?data breaches, leaks and misuse.?Governance frameworks help build data systems that are clear, explainable, fair and inclusive. In turn, these data systems safeguard privacy and security and maintain customer loyalty and trust.

Securely use data for AI initiatives: Data governance involves understanding the origin, sensitivity and lifecycle of all the data that an organization uses. This is the foundation for any AI governance practice and is crucial in mitigating various enterprise risks. Data governance helps organizations bring high-quality data to AI and ML initiatives while protecting that data and complying with relevant rules and regulations.

Enable more accurate data analytics: Having the right data is the foundation for advanced data analytics and?data science initiatives. Carefully governed data enables valuable initiatives such as business intelligence reporting or more complex predictive?machine learning (ML)?projects.

See how IBM is using the Data Provenance Standards

Great focus on data quality and governance! We've seen firsthand how critical transparency in AI systems is for building trust. Making algorithms more accessible and understandable is key to responsible AI adoption. Looking forward to seeing how these standards help shape that future?

回复
Abdi Adan

Student at Coursera

2 个月

Very informative

回复
Venkatesh Ammireddy

User Experience Architect | Data UX Visualizer | Ai Explorer | Experience Analyzer | Visual Experience| Dive D4 Resolution to Find

2 个月

Loved it..searching more..

回复
Brian Gagne

CT??O @ Kief Studio | Innovation @ TRaViS | AI/ML Business Fellow @ Perplexity

2 个月

Cool

回复
Rafael Cerero Dolz

IBM | Desarrollo de Negocios.

2 个月

Interesante

回复

要查看或添加评论,请登录

IBM Data, AI & Automation的更多文章

社区洞察

其他会员也浏览了