登录查看更多内容

Data Quality Monitoring

Harihar Mohapatra

Deloitte | Driving Digital Transformations with Engineering, AI & Data | Technology Leader

发布日期: 2024年6月13日

Data governance and data quality are top-of-mind in more and more Organizations. In today's digital age, data has become a critical asset that guides the direction of businesses of all sizes.

Data quality describes the accuracy, completeness, consistency, and other attributes of data. Organizations need high-quality data that they can trust to make critical decisions. Without high-quality data, organizations cannot become data-driven because they cannot trust their data. The lack of trust hinders the organizations to use their data to make impactful business decisions, leading to inefficiency, missed opportunities, and ultimately, financial loss. Clearly, working with product or customer data from disparate sources without considering Data quality can lead to disastrous results.

Gartner breaks down the Data Quality problem further to these aspects:

Parsing and standardization
Generalized “cleansing”
Matching
Profiling
Monitoring
Enrichment

Traditional rules-based, manual approach to ensuring data quality.

Key metrics of Data Quality Monitoring

Error ratio

The error ratio measures the proportion of records with errors in a dataset. A high error ratio indicates poor data quality and could lead to incorrect insights or faulty decision-making. Divide the number of records with errors by the total number of entries to calculate the error ratio.

Duplicate record rate

Duplicate records can occur when multiple entries are created for a single entity due to system glitches or human error. The duplicate record rate calculates the percentage of duplicate entries within a given dataset compared to all records.

Data time-to-value

Data time-to-value describes the rate of obtaining value from data after it has been collected. A shorter time-to-value indicates that your organization is efficient at processing and analyzing data for decision-making purposes.

Data quality monitoring techniques

Data profiling

Data profiling is the process of examining, analyzing and understanding the content, structure and relationships within your data.

Data auditing

Data auditing is the process of assessing the accuracy and completeness of data by comparing it against predefined rules or standards. This technique helps organizations identify and track data quality issues, such as missing, incorrect, or inconsistent data.

Data quality rules

Data quality rules are predefined criteria that your data must meet to ensure its accuracy, completeness, consistency and reliability.

Data cleansing

Data cleansing, also known as data scrubbing or data cleaning, is the process of identifying and correcting errors, inconsistencies and inaccuracies in your data.

Real-time data monitoring

Real-time data monitoring is the process of continuously tracking and analyzing data as it is generated, processed and stored within your organization.

Tracking data quality metrics

Data quality metrics are quantitative measures that help organizations assess the quality of their data. These metrics can be used to track and monitor data quality over time, identify trends and patterns and determine the effectiveness of your data quality monitoring techniques.

领英推荐

Strategies to Implement the DMBOK. Data Quality…

Know How 1 年前

What is Data Quality Testing?

Bigeye 2 个月前

Data Governance and Management: Your Roadmap to…

Namasys Analytics 1 年前

Data performance testing

Data performance testing is the process of evaluating the efficiency, effectiveness and scalability of your data processing systems and infrastructure.

As technology continues to evolve, several trends are shaping the future of data quality management:

Artificial Intelligence and Machine Learning: AI and ML algorithms are increasingly being used to automate data quality processes, detect anomalies, and improve data cleansing techniques.

Blockchain Technology: Blockchain offers enhanced data integrity and security, reducing the risk of data tampering and ensuring trust in digital transactions.

Regulatory Landscape: Evolving regulatory requirements, such as GDPR and CCPA, are placing greater emphasis on data governance and compliance, driving organizations to prioritize data quality management.

Predictive Analytics: Predictive analytics enables organizations to anticipate and prevent data quality issues before they occur, enabling proactive management of data quality.

Snowflake Data Metric Functions

Data Metric Functions (DMF) are a class of functions that can be used to monitor the quality of your data. There are both “out of the box” functions, provided by Snowflake, and user-defined functions available. Once enabled, these functions can be used to provide regular metrics on data quality issues within the tables you specify.

Consider the following rows of data, from a table called RAW.PAYPAL.PAYMENTS

These rows represent raw data of payments through a payment processor. Sometimes payments fail. It might be a good idea to know how often this happens and enable users to have regular insight into the number of failures over a given period of time. DMFs make this extremely easy.

Steps

Declaring the DMF
We can create DMFs that are at a level of abstraction above any particular table, and hence use a particular DMF over multiple tables if it is appropriate.
Once we have created our DMF, we have to turn to the table we want to monitor and establish how often we want to run this function.
We need to attach the function to the table we are interested in.

ALTER TABLE RAW.PAYPAL.PAYMENT

ADD DATA METRIC FUNCTION fail_status ON (status)

Everything is logged to a table in SNOWFLAKE.LOCAL called DATA_QUALITY_MONITORING_RESULTS_RAW, which is in turn accessed through a view called DATA_QUALITY_MONITORING_RESULTS.

Automation of data quality.

AWS Glue Data automatically computes statistics, recommends quality rules, monitors, and alerts you when it detects issues. For hidden and hard-to-find issues, Glue Data Quality uses ML algorithms. The combined power of rule-based and ML approach, along with the serverless, scalable and open solution, enables you to deliver high quality data to make confident business decisions.?

Telmai Data Observability Platform helps organizations monitor and manage the quality of their data by providing a centralized view of data across all data sources. Telmai’s engine performs data profiling and analysis to identify potential issues, such as missing values, duplicate records, and incorrect data types; ML-based anomaly detection to surface unexpected values in data that may indicate problems and to predict what can be reasonably expected; and continuous monitoring to detect changes in data quality over time.

Google Cloud Dataplex performs data management and governance using machine learning to classify data, organize data in domains, establish data quality, determine data lineage, and both manage and govern the data lifecycle.

Final Words

Mastering data quality management is essential for organizations seeking to unlock the full potential of their data assets. By understanding the dimensions of data quality, addressing common challenges, adopting best practices, and embracing emerging trends, businesses can ensure data integrity, reliability, and relevance in an increasingly data-driven world.

Critical data elements: Identify what is critical for the business; this could be a regulatory report, a cube, or a KPI.

Data value: Estimate the shelf-life of poor data quality or, in other words, the risk associated with bad quality; focus first on those areas with the highest risk.

要查看或添加评论，请登录

Harihar Mohapatra的更多文章

Optimizing Product Catalogs: How MongoDB Fits the Bill

2024年10月11日

Optimizing Product Catalogs: How MongoDB Fits the Bill

MongoDB seems to be the perfect fit to implement a product catalog since products maps so well to documents. Almost…
Data Management Landscape

2024年9月17日

Data Management Landscape

Data Architecture The most typical forms of data architecture are as follows: Data Architecture Principles Data…
A deep dive into data pipeline

2024年6月19日

A deep dive into data pipeline

This article gives you a basic overview of data pipelining, including what it means, how it’s put together, data…
Data storage for Modern High performance business application.

2024年6月15日

Data storage for Modern High performance business application.

The business requirements of many modern business applications often involve processing large amounts of data…
Legacy Modernization and Operational Data Layer

2024年6月7日

Legacy Modernization and Operational Data Layer

An Operational Data Layer (or ODL) is an architectural pattern that centrally integrates and organizes siloed…
Banking and FIs top priorities

2024年6月5日

Banking and FIs top priorities

Big Data The banking industry has significantly transformed from traditional brick-and-mortar establishments to modern…
Responsible AI

2024年5月28日

Responsible AI

Artificial Intelligence offers remarkable benefits but can also create significant new risks. The journey towards…
Adapting to market disruptions and opportunities through Enterprise Architecture and Digital transformation.

2024年5月26日

Adapting to market disruptions and opportunities through Enterprise Architecture and Digital transformation.

A digital platform is the result of enterprise architecture. Digital transformation enables organizations to respond to…
Harnessing the power of BIAN ,TOGAF and BaaS Microservices for future ready banks.

2024年5月17日

Harnessing the power of BIAN ,TOGAF and BaaS Microservices for future ready banks.

The Banking Industry Architecture Network (BIAN) is a collaborative not-for-profit ecosystem formed of leading banks…
Architectures for Digital Transformation

2024年5月14日

Architectures for Digital Transformation

Corporations investing in digital transformation must leverage different architecture domains in the planning stages of…

5 条评论

See all articles

Data Quality Monitoring

Harihar Mohapatra

Deloitte | Driving Digital Transformations with Engineering, AI & Data | Technology Leader

Traditional rules-based, manual approach to ensuring data quality.

Key metrics of Data Quality Monitoring

Data quality monitoring techniques

领英推荐

Snowflake Data Metric Functions

Automation of data quality.

Final Words

Harihar Mohapatra的更多文章

社区洞察

其他会员也浏览了

Contribution of MRO Data Standardization in Diminishing Enterprise Costs

DON'T LET BAD DATA QUALITY HOLD YOU BACK

The Top Reasons Your Business Needs a Chief Data Officer

5 Pillars Of Effective Data Management In Modern Data Systems

AI for data teams: ensuring real-time data quality

Why Data quality is important for your business

What Is Data Cleansing, Why Is It Important, And How Can You Do It?

The Role of Data Observability Tools in Ensuring Data Quality and Integrity

Importance of Data Quality & Governance Framework in Analytical Platforms

Streamlining Data Management: How to Identify Duplicate Entries

Traditional rules-based, manual approach to ensuring data quality.

Key metrics of Data Quality Monitoring

Data quality monitoring techniques

领英推荐

Snowflake Data Metric Functions

Automation of data quality.

Final Words

Harihar Mohapatra的更多文章

Optimizing Product Catalogs: How MongoDB Fits the Bill

Data Management Landscape

A deep dive into data pipeline

Data storage for Modern High performance business application.

Legacy Modernization and Operational Data Layer

Banking and FIs top priorities

Responsible AI

Adapting to market disruptions and opportunities through Enterprise Architecture and Digital transformation.

Harnessing the power of BIAN ,TOGAF and BaaS Microservices for future ready banks.

Architectures for Digital Transformation

社区洞察

其他会员也浏览了

Contribution of MRO Data Standardization in Diminishing Enterprise Costs

DON'T LET BAD DATA QUALITY HOLD YOU BACK

The Top Reasons Your Business Needs a Chief Data Officer

5 Pillars Of Effective Data Management In Modern Data Systems

AI for data teams: ensuring real-time data quality

Why Data quality is important for your business

What Is Data Cleansing, Why Is It Important, And How Can You Do It?

The Role of Data Observability Tools in Ensuring Data Quality and Integrity

Importance of Data Quality & Governance Framework in Analytical Platforms

Streamlining Data Management: How to Identify Duplicate Entries