登录查看更多内容

Data Management for AI

Guna Jayaseelapandian

Data and Analytics Leader

发布日期: 2020年8月12日

AI’s use in decision-making is ubiquitous today. The auto-pilot program that allows you to change lanes without user intervention on a Tesla, potentially suspicious money laundering transactions on a banking platform and approvals by the FDA on clinical trials for a vaccine all rely on the quality of data fed to train their machine learning models and subsequently the real-world data that these models use to predict in decision making.

Whether the organization uses external data or sources it internally, good quality data is never a given. Even today’s organizations struggle to ensure the right data is used for the right purpose with adequate quality.

Traditional data management

Previously the data life-cycle catered to manual decision-making mostly via BI reports. A typical process to manage data operations for traditional analytical needs is shown below.

But today’s companies are increasingly held accountable by laws, rules, regulations and by internal stakeholders to ensure that insights from AI that result in regulatory compliance and other business decisions, are audit-able for source data quality and overall efficacy.

Challenges

Given this scenario of increasing importance of data quality and availability, companies have to rethink the way data is managed across the organization. Traditionally, data management has catered to Business Intelligence (BI) that contrasts to today’s world of Big Data and machine learning at scale.

Some of the key challenges forcing companies to rethink data management are

Reliability and consistency of source data
Hard to explain models (e.g. CNNs)
Requirements to keep lineage information
Population data and model validation
Data Privacy

Companies now are faced not just with data governance but also with model governance. Model Governance as defined by the Open Risk Manual is the name for the overall internal framework of a firm or organization that controls the processes for model development, validation and usage, assign responsibilities and roles etc.

Although model governance seems on the surface as a separate process, it also impacts the data life-cycle and should be thought as one of the requirements going into operations. Why? This is because models do not exist in isolation from the data they were produced from and are acted upon. In fact good data governance goes hand in hand with good model governance with a lot of overlap between both processes.

Towards better data management for AI

So how do we refactor our existing data life-cycle to include model governance? A good approach should consider various internal factors but overall the key is to include the AI and model governance requirements upfront so that in each agile cycle any changes to the business requirements also go through a model governance update. A simplified high-level version of the new process could look like the one shown below.

There are other aspects that need to be considered like personnel and the integration points between both processes which I haven't delved in here for simplicity sake. These can vary in scale and complexity for the type of organization, maturity etc.

Data management for AI is getting increasingly complex given the myriad of technologies and applications in today’s world. It is important that companies manage data pipelines for AI in a formal fashion that could be better maintained to ensure long-term success.

Data Management for AI

Guna Jayaseelapandian

Data and Analytics Leader

Traditional data management

Challenges

Towards better data management for AI

更多精彩文章

社区洞察

其他会员也浏览了

AI-Based Solutions for Data Management in Fragmented Logistics Ecosystem

Why Data Contracts are Key to AI Product Success

You, the enterprise and AI - Part 2: Data Science vs Artificial Intelligence

The old rule of data warehousing also applies to AI - focusing on data quality and governance

Tackling Data Challenges to Build Enterprise AI

AI-Powered Data Quality Management: Automated Anomaly Detection and Correction

Mastering Data Management in the 21st Century!

The Foundation of AI: Why Data Strategy Comes First

Defining Stewardship in the Age of AI

Traditional data management

Challenges

Towards better data management for AI

Document Classification - Notes on Methodology and Implementations

2018年1月17日

Using Stanford NER

2018年1月9日

Writing a DApp and floating your own Coin using Ethereum/Truffle

2017年11月7日

Implementing a Regulatory Comprehensive Compliance Framework using Blockchain Distributed Ledger Technology

2017年5月24日

Regulatory Compliance - Understanding Independent Testing and Technology Considerations

2016年5月11日

社区洞察

其他会员也浏览了

AI-Based Solutions for Data Management in Fragmented Logistics Ecosystem

Why Data Contracts are Key to AI Product Success

You, the enterprise and AI - Part 2: Data Science vs Artificial Intelligence

The old rule of data warehousing also applies to AI - focusing on data quality and governance

Tackling Data Challenges to Build Enterprise AI

AI-Powered Data Quality Management: Automated Anomaly Detection and Correction

Mastering Data Management in the 21st Century!

The Foundation of AI: Why Data Strategy Comes First

Defining Stewardship in the Age of AI