Data Discrepancies

Data Discrepancies are confused with Data Errors which is not true. Data Errors are searched, identified, and fixed based on Data Discrepancies.

Let’s decode it…..

‘Dictionary: An illogical or surprising lack of compatibility or similarity between two or more facts are called discrepancies.’

Data discrepancies can be like mismatch of data from source verses data processed.

Some time, when data analysts run manual calculation, those are not matching with the numbers coming from automated reports which means there are data reconciliation issues. This is Data Discrepancy.

For example, two results are not matching based on business logic. Refer to the image, total of 9,093 + 6,104 is not equal to All Users i.e., 14,018. This means, total of 14,018 is coming from different source and 9,093 & 6,104 coming from different source. This is Data Discrepancy.

As mentioned above, once data discrepancies are found, the next step is to start identified the data errors, code errors, business rule errors etc.

Please note, it’s not always the case that there will always be data issue. There can be, that not all facts are used while processing both side of data. For example, one analyst run a query like Sum of all the rows from a table and he send the report to the management. Second analyst runs the query to find out what sales took place in each region, but he filtered one of the regions. Due to above, the total of both reports will not match.

Data Discrepancies are normally found when data is sitting in Silos (explained separately) that is the reason the whole concept of Single Source of Truth has surfaced where Data Warehouse catered this issue by bringing all data silos at one place, ideally in one Unified Data Model or in a Star or Snowflake schema(s).

Cheers.

Image: https://stackoverflow.com/questions/43284068/discrepancy-in-google-analytics-data-when-using-segments

要查看或添加评论,请登录

Mustafa Qizilbash的更多文章

社区洞察

其他会员也浏览了