登录查看更多内容

What is data quality?

Shruti Anand

Associate Consultant at HUQUO

发布日期: 2024年9月25日

Data quality measures how well a dataset meets criteria for accuracy, completeness, validity, consistency, uniqueness, timeliness and fitness for purpose, and it is critical to all data governance initiatives within an organization.

Data quality standards ensure that companies are making data-driven decisions to meet their business goals. If data issues, such as duplicate data, missing values, outliers, aren’t properly addressed, businesses increase their risk for negative business outcomes. According to a Gartner report, poor data quality costs organizations an average of USD 12.9 million each year?1. As a result, data quality tools have emerged to mitigate the negative impact associated with poor data quality.

When data quality meets the standard for its intended use, data consumers can trust the data and leverage it to improve decision-making, leading to the development of new business strategies or optimization of existing ones. However, when a standard isn’t met, data quality tools provide value by helping businesses to diagnose underlying data issues. A root cause analysis enables teams to remedy data quality issues quickly and effectively.

Data quality isn’t only a priority for day-to-day business operations; as companies integrate artificial intelligence (AI) and automation technologies into their workflows, high-quality data will be crucial for the effective adoption of these tools. As the old saying goes, “garbage in, garbage out”, and this holds true for machine learning algorithms as well. If the algorithm is learning to predict or classify on bad data, we can expect that it will yield inaccurate results.

EbookBuild responsible AI workflows with AI governance

Learn the building blocks and best practices to help your teams accelerate responsible AI.

领英推荐

AI data management: optimize your company’s operations…

N-iX 1 个月前

Unlocking Next-Gen Efficiency: AI/ML Automation for…

Yoav Aviv 1 年前

Comparing Manual and Automated Data Labeling: Pros and…

Objectways 3 个月前

Data quality, data integrity and data profiling are all interrelated with one another. Data quality is a broader category of criteria that organizations use to evaluate their data for accuracy, completeness, validity, consistency, uniqueness, timeliness, and fitness for purpose. Data integrity focuses on only a subset of these attributes, specifically accuracy, consistency, and completeness. It also focuses on this more from the lens of data security, implementing safeguards to prevent against data corruption by malicious actors.

Data profiling, on the other hand, focuses on the process of reviewing and cleansing data to maintain data quality standards within an organization. This can also encompass the technology that support these processes.

Dimensions of data quality

Data quality is evaluated based on a number of dimensions, which can differ based on the source of information. These dimensions are used to categorize data quality metrics:

Completeness: This represents the amount of data that is usable or complete. If there is a high percentage of missing values, it may lead to a biased or misleading analysis if the data is not representative of a typical data sample.
Uniqueness: This accounts for the amount of duplicate data in a dataset. For example, when reviewing customer data, you should expect that each customer has a unique customer ID.
?Validity: This dimension measures how much data matches the required format for any business rules. Formatting usually includes metadata, such as valid data types, ranges, patterns, and more.
Timeliness: This dimension refers to the readiness of the data within an expected time frame. For example, customers expect to receive an order number immediately after they have made a purchase, and that data needs to be generated in real-time.
Accuracy: This dimension refers to the correctness of the data values based on the agreed upon “source of truth.” Since there can be multiple sources which report on the same metric, it’s important to designate a primary data source; other data sources can be used to confirm the accuracy of the primary one. For example, tools can check to see that each data source is trending in the same direction to bolster confidence in data accuracy.
Consistency: This dimension evaluates data records from two different datasets. As mentioned earlier, multiple sources can be identified to report on a single metric. Using different sources to check for consistent data trends and behavior allows organizations to trust the any actionable insights from their analyses. This logic can also be applied around relationships between data. For example, the number of employees in a department should not exceed the total number of employees in a company.
Fitness for purpose: Finally, fitness of purpose helps to ensure that the data asset meets a business need. This dimension can be difficult to evaluate, particularly with new, emerging datasets.??????????????????????????????????????????????????????????????????????????????????????????????????????????

These metrics help teams conduct data quality assessments across their organizations to evaluate how informative and useful data is for a given purpose.

Why is data quality important?

Over the last decade, developments within hybrid cloud, artificial intelligence, the Internet of Things (IoT), and edge computing?have led to the exponential growth of big data. As a result, the practice of master data management (MDM) has become more complex, requiring more data stewards and rigorous safeguards to ensure good data quality.

Businesses rely on data quality management to support their data analytics initiatives, such as business intelligence dashboards. Without this, there can be devastating consequences, even ethical ones, depending on the industry (e.g. healthcare). Data quality solutions exist to help companies maximize the use of their data, and they have driven key benefits, such as:

Better business decisions: High quality data allows organizations to identify key performance indicators (KPIs) to measure the performance of various programs, which allows teams to improve or grow them more effectively. Organizations prioritize data quality will undoubtedly have an advantage over their competitors.
Improved business processes: Good data also means that teams can identify where there are breakdowns in operational workflows. This is particularly true for the supply chain industry, which relies on real-time data to determine appropriate inventory and location of it after shipment.
Increased customer satisfaction: High data quality provides organizations, particularly marketing and sales teams, with incredible insight into their target buyers. They are able to integrate different data across the sales and marketing funnel, which enable them to sell their products more effectively. For example, the combination of demographic data and web behavior can inform how organizations create their messaging, invest their marketing budget, or staff their sales teams to service existing or potential clients.

要查看或添加评论，请登录

Shruti Anand的更多文章

DBMS

2025年3月21日

DBMS

A Database Management System (DBMS) is a software solution designed to efficiently manage, organize, and retrieve data…
Collection Modeling

2025年3月20日

Collection Modeling

Understanding Collection Collection refers to the systematic and organized effort to collect past due payments from…
What Is the Difference Between Inbound and Outbound

2025年3月19日

What Is the Difference Between Inbound and Outbound

Typically, a place that maps more incoming calls is called an inbound call center. On the other hand, centers that make…
What Is Procurement Data Management?

2025年3月18日

What Is Procurement Data Management?

Procurement data management is the process of collecting, organizing, and managing all information related to the…
Data Visualization

2025年3月17日

Data Visualization

Data visualization is the graphical representation of information and data. By using visual elements like charts…
What is Metadata?

2025年3月13日

What is Metadata?

Often referred to as data that describes other data, metadata is structured reference data that helps to sort and…
What Is Loss Given Default (LGD)?

2025年3月12日

What Is Loss Given Default (LGD)?

Loss given default (LGD) is the estimated amount of money a bank or other financial institution loses when a borrower…
Tableau

2025年3月10日

Tableau

Tableau helps people and organizations be more data-driven As the market-leading choice for modern business…
What is Kubernetes?

2025年3月8日

What is Kubernetes?

Kubernetes, also known as k8s or kube, is an open source container orchestration platform for scheduling and automating…
What is Data Visualization?

2025年3月7日

What is Data Visualization?

Data visualization is the graphical representation of information and data. By using visual elements like charts…

See all articles

What is data quality?

Shruti Anand

Associate Consultant at HUQUO

领英推荐

Shruti Anand的更多文章

社区洞察

其他会员也浏览了

Revolutionizing Data Management with Intelligent Data Agents

Automated Data Cleansing: Use AI to Automatically Identify and Correct Inaccurate or Duplicate Data

Data Technology Growth in the new age

CEO Playbook to Unlock the Power of Decision Intelligence

Data Governance Playbook: The Blueprint for Responsible AI-Driven Innovation

LEVERAGING DATA DEMOCRACY FOR SUPPLY CHAIN OPTIMIZATION

Data Governance: Mastering the Info Asset in the Digital Age

The Data Maturity Journey: Preparing Your Organization for AI

Data Collection: What is Data Collection? | Methods, Types, and Techniques

Understanding Structured vs. Unstructured Data: Definitions and Key Differences.

领英推荐

Shruti Anand的更多文章

DBMS

Collection Modeling

What Is the Difference Between Inbound and Outbound

What Is Procurement Data Management?

Data Visualization

What is Metadata?

What Is Loss Given Default (LGD)?

Tableau

What is Kubernetes?

What is Data Visualization?

社区洞察

其他会员也浏览了

Revolutionizing Data Management with Intelligent Data Agents

Automated Data Cleansing: Use AI to Automatically Identify and Correct Inaccurate or Duplicate Data

Data Technology Growth in the new age

CEO Playbook to Unlock the Power of Decision Intelligence

Data Governance Playbook: The Blueprint for Responsible AI-Driven Innovation

LEVERAGING DATA DEMOCRACY FOR SUPPLY CHAIN OPTIMIZATION

Data Governance: Mastering the Info Asset in the Digital Age

The Data Maturity Journey: Preparing Your Organization for AI

Data Collection: What is Data Collection? | Methods, Types, and Techniques

Understanding Structured vs. Unstructured Data: Definitions and Key Differences.