登录查看更多内容

Data Quality Capability Solution Landscape

Rohan Rekhi

Servant Leadership | Data API Strategy | Data Engineering Delivery | Data Governance | Cloud Certified | Snowflake Architect | DataOps | Data Product | Asset Management | Technical Reviews | Biz Dev| Data Strategy

发布日期: 2024年4月29日

Thank you for reading my latest article here.

Here at LinkedIn, I regularly write about data architecture, Business Architectures Business Concepts and technology trends. To read my future articles simply join my Newsletter on LinkedIn or click 'Follow'. Also feel free to connect with me via LinkedIn.

Data Quality is often a topic of interest and debate. There are multiple schools of thoughts around how, when, where and with what Data Quality Design and Implementation should be done. There are many Open Sourced and/or Licensed Platforms and Frameworks out there which will help you design your Data Quality Solution. But it is important to know from outset what are your Data quality requirements, what capabilities you need to build and what are your Design Principles. Please keep in mind that standing up Data Quality solution can be a costly undertaking for organizations which already have lot of Production workloads and heavy data footprints. Whatever approach you choose should be holistic, scalable and flexible for future expansion.

I want to talk about some of those key points in this article.

Support for multiple Lines of Business: You may start initial Design and Implementations with specific Line of Business, this Framework should be easily adaptable by other Lines of Businesses within the Enterprise.
Support for Multiple Applications, Systems of Records and Data Domains: Each application, SOR and Data Domain may have different need for Data Quality, SLAs, Reporting and notifications. Design should be seamless enough.
Support for Hybrid Architectures: DQ Framework should be agnostic to whether it is cloud Native, On-Prem or Hybrid Data Design.
Support for Rule Mastering: Support for Rule Mastering involving Technology and Data Management teams.
Support for Configuration Driven Development: Core Rule engine should be generic enough and new DQ Rule development and onboarding should be configuration driven.
Support for Notifications: Support for multiple mechanisms for notifications to Data Management, Technology, Applications and Business Stakeholders.
Support for Incident Management: Any remedial workflow needs, integration with Incident Management.
Support for Automation, Orchestration and Scheduling: All DQ Process, executions should be automated.

Now let's talk about each of these points:

Mohan Kumar 3 个月前

The present and future of Data Architecture: The…

Plain Concepts 8 个月前

Data Fabric Architecture

Andre Ripla PgCert 2 个月前

Data Quality Rule Mastering: Design should support DQ Configurations per Data Domain, Data Source and Data sets. It should support simple Technology Friendly configurations such as JSON, XML etc. Alternatively, Data Management and Data Stewards should have easy to use Web UI to enter, review or authorize/approve DQ configurations. This will help abstract them from underlying configuration format. Behind the scenes data entered by this Web Application, should still be converted to this standard Configuration format. These configurations should trigger automated CI/CD pipelines which helps load translated DQ Rules in DQ Repository.
DQ Rules Repository: Design should support Cloud native, On-prem or Hybrid solution. It should be compatible to all 3 major cloud providers like AWS, GCP and MS Azure. You can as well choose to use Cloud Native Data Platforms like Snowflake and Databricks. So essentially design should have support for any NoSQL, Relational, Lakehouse and Data Lake design patterns. Few choices you may have been SQL Server, Oracle, AWS RDS, AWS Athena, AWS Redshift and so on.
Rule Engine: Rule engine should be at the Core of your DQ Design. Think of this as an application or Services which accepts DQ Rules from Repository and input data (either during the Data Pipeline or after the fact once it is loaded into desired destination) and generate output and DQ Exceptions. Rule Engine will have multiple downstream integrations for Incident Management, DQ Notifications, DQ Portals/Dashboards, Downstream workflows and Business Intelligence/Analytics on DQ Metrics.
DQ Exception Repository: DQ Exceptions as part of DQ Execution Results can be stored separately in a repository. This can be physically or logically (Schema) stored separate from DQ Rules Repository.
Incident Management: Many organizations have their preferred Incident Management platform to create, track, manage Business Data incidents at various levels of severity and accordingly, SLA expectations for closure of those incidents can be managed. DQ Rule Exception generated from Rule Engine should leverage this enterprise Incident Management platform rather than creating a new solution altogether. In order to achieve that you need some solution to do continuous integration and detection of logs, events, metrics and DQ results. There are many solutions in the market for this capability, like, SumoLogic, Datadog, AppDynamics and so on. Enterprise Incident platform will ensure that incidents are created in correct queue and assigned to correct teams for SLA based resolution and notification.
Notifications: Incidents are created in case of DQ Exceptions, but under normal circumstances, you still need to do notifications on DQ Executions and their results. Design should support various common notification mechanisms such as Emails, Collaboration channel notifications (Like MS Teams, Slack etc..) and application alerts for Application users. Notifications need not necessarily have to provide all the details , but rather should point them to place where Execution details and results are stored. Notification may just provide high level summary and status.
DQ Portal: Dedicated DQ portal/Dashboard should be build serve the DQ Execution Results, their history and result attribution. Visibility should be based on level of access. DQ Results should be served using DQ APIs to be consumed into Portal.
DQ Results Repository: Just like DQ Rules and DQ Exception you will need DQ Results repository to store Execution details and results for DQ Rules. It should also store long term history of DQ Results. Just like other repositories it can be physically or logically separate repository. Visibility should be based on access levels.
Business Workflow: As DQ Framework is part of larger Data Ecosystem, it is key to have automated orchestration to trigger either Happy path or Remediation/Exception workflows, upon successful DQ Engine execution.
Business Intelligence: Data Management team should be provided with some Business Intelligence, Analytics, Visualization, Dashboarding capability on DQ Metrics and their long-term history. This will help them understand DQ patterns and take necessary continuous steps for Data Quality Governance.

To stay up to date with my latest articles in, make sure to subscribe to my newsletter follow me on LinkedIn , and if you’re interested in taking a deeper dive into some of these topics, please feel free to reach out to me.

#dataquality #dataqualitygovernance #hybridarchitecture #configurationdrivendq

要查看或添加评论，请登录

Rohan Rekhi的更多文章

Portfolio API

2024年9月19日

Portfolio API

Thank you for reading my latest article here. Here at LinkedIn, I regularly write about data architecture, Business…
Solution Architecture - Evaluations & Selections

2024年9月7日

Solution Architecture - Evaluations & Selections

Thank you for reading my latest article here. Here at LinkedIn, I regularly write about data architecture, Business…
Separately Managed Account - Strategy

2024年8月25日

Separately Managed Account - Strategy

Thank you for reading my latest article here. Here at LinkedIn, I regularly write about data architecture, Business…
Solution Architecture - Cost Governance.

2024年8月14日

Solution Architecture - Cost Governance.

Thank you for reading my latest article here. Here at LinkedIn, I regularly write about data architecture, Business…
Solution Architecture - Technology Architecture

2024年7月31日

Solution Architecture - Technology Architecture

Thank you for reading my latest article here. Here at LinkedIn, I regularly write about data architecture, Business…
Separately Managed Account Lifecycle

2024年7月22日

Separately Managed Account Lifecycle

Thank you for reading my latest article here. Here at LinkedIn, I regularly write about data architecture, Business…
Solution Architecture - Enterprise Application Architecture

2024年7月14日

Solution Architecture - Enterprise Application Architecture

Thank you for reading my latest article here. Here at LinkedIn, I regularly write about data architecture, Business…

1 条评论
Solution Architecture

2024年7月3日

Solution Architecture

Thank you for reading my latest article here. Here at LinkedIn, I regularly write about data architecture, Business…

1 条评论
Separately Managed Accounts

2024年6月26日

Separately Managed Accounts

Thank you for reading my latest article here. Here at LinkedIn, I regularly write about data architecture, Business…
Private Credit

2024年6月6日

Private Credit

Thank you for reading my latest article here. Here at LinkedIn, I regularly write about data architecture, Business…

1 条评论

See all articles

Data Quality Capability Solution Landscape

Rohan Rekhi

Servant Leadership | Data API Strategy | Data Engineering Delivery | Data Governance | Cloud Certified | Snowflake Architect | DataOps | Data Product | Asset Management | Technical Reviews | Biz Dev| Data Strategy

领英推荐

Rohan Rekhi的更多文章

社区洞察

其他会员也浏览了

?? Part 2: Connecting the Dots: A Summary of Data Architecture Evolution (1960-1980)

Part 1: Data Architecture - Beyond the Buzzword

Hyper-Scalable Data Architectures: Unleashing the Value of Your Data

ABN AMRO's data and integration mesh

3 Key Steps: Why is Data Modernization Key to a Winning Data Strategy?

The Crucial Role of Enterprise Data Architecture in Establishing Effective Data Governance

Data Fabric and its 4 Pillars

The 6-Step Data Architecture Shift Framework (6-DASF): Building a Case for Evolving Your Data Architecture

Data Architecture and its Significance

It's Way Too Late Not to Know Where Your Data Is

领英推荐

Rohan Rekhi的更多文章

Portfolio API

Solution Architecture - Evaluations & Selections

Separately Managed Account - Strategy

Solution Architecture - Cost Governance.

Solution Architecture - Technology Architecture

Separately Managed Account Lifecycle

Solution Architecture - Enterprise Application Architecture

Solution Architecture

Separately Managed Accounts

Private Credit

社区洞察

其他会员也浏览了

?? Part 2: Connecting the Dots: A Summary of Data Architecture Evolution (1960-1980)

Part 1: Data Architecture - Beyond the Buzzword

Hyper-Scalable Data Architectures: Unleashing the Value of Your Data

ABN AMRO's data and integration mesh

3 Key Steps: Why is Data Modernization Key to a Winning Data Strategy?

The Crucial Role of Enterprise Data Architecture in Establishing Effective Data Governance

Data Fabric and its 4 Pillars

The 6-Step Data Architecture Shift Framework (6-DASF): Building a Case for Evolving Your Data Architecture

Data Architecture and its Significance

It's Way Too Late Not to Know Where Your Data Is