Practical Data Quality
Lake Tahoe on Unsplash: Tim Peterson

Practical Data Quality

Imagine having pristine clean, good-quality data for all your analytics, machine learning, and decision-making. Yes — I can’t imagine it either!

How does one define Data Quality? There are many ways of doing that. Let us start simple. Data has to be 100% accurate and complete to be considered of good quality. Such a simple Utopian view of data quality seldom leads to anything but a disappointment.

Let us examine this further through a series of questions.

  1. Where does data come from? Two possibilities - human-generated and machine-generated.
  2. Where does data get stored at first? Every data element gets stored in its system of initial record (entry) (where it is first recorded).
  3. How do you combine data elements with one another? By connecting them with something in common that belongs to one or more systems that talk to one another directly or through a central hub.
  4. How do you consume data? It varies - a system of record or a purpose-specific or generic system of consumption.

Now each data element goes through its own journey from the point of generation to the present time where we are trying to use it for some purpose. Data elements get created, linked to other data elements, overwritten, and even deleted sometimes before they are accessible to consumers.

Let us imagine, that if data elements were generated as good data (no human, machine, or integration errors.) when does it change from being good to bad? One thought - Data elements can also have a temporal value (time-sensitive).

For example, The face value of a publicly traded stock at 10 AM will be valuable when your purpose is to know the current value. But, the question remains the same - at 12 PM, the price at 10 AM is no longer current. i.e., your good data at 10 AM now turned into bad data in that context of your question.

What are some of the data quality issues you face? In the next article, let us look at some of the most common data quality problems and how we can solve them.

Here are complete links to all the posts in this series:

  1. Practical Data Quality - This article
  2. Human Data Entry - DQ
  3. Dealing with Leadgen data - specific example
  4. Generalizing the data quality issues - Part 1

Sree Rekha Adla

Leadership | Data Analytics & Software Engineering | Wellness Enthusiast

2 年

wow, simple and straight , looking forwarding for the series !

要查看或添加评论,请登录

Karteek Y.的更多文章

社区洞察

其他会员也浏览了