Improving Data Quality using Feedback loops

Improving Data Quality using Feedback loops

Data Quality conundrum:

In any data-driven application, particularly data analytics products, data quality is one of the most unstable variables that can throw the whole product into oblivion.?

It’s not only about poor data quality, the most potent problem in the development of data analytics products is the unavailability of good quality data sets that are used in the development and testing phase.?

In a standard R&D approach, the product squad conceptualizes a product along with business leaders and other stakeholders. Where the usual best-case scenario is to move forward with a development partner who will provide a sample data set for the development. Needless to say, the worst-case scenario is to generate sample data based on common understanding and a few hit & trial methods of speculating what kind of data might come when the product will be plugged into the real-time data stream environment of the customer.?

This whole system is based on the fact that we are speculating or know very little about the data quality of the real-world system. Some organizations also look out for buying real-world data (RWD) to address this problem. However, RWD data is expensive and organizations have no way to know the real return on their investment ahead of time.

Feedback loop at the rescue:

No alt text provided for this image

With this problem at hand, the question that we all should be asking ourselves is how to break this data quality conundrum. The answer is simple but needs a little patience: create a feedback loop. This feedback loop will simply give you control over issues occurring in the Real-World-Data. The Data Quality team will pick up these issues and continuously improve the Data quality testing framework, making it more robust, in turn making the product more reliable.?

Now few things to keep in mind,?

  • The whole process might look tiring and lengthy, but over the period of time (say 6-8 months) an exponential rise in the quality of the product will be seen. Trust the process.
  • You have to collaborate with the first few customers to act as early customer monitoring. This collaboration will only be in place to make sure that critical data load failures do not get escalated. Predominantly, the sales team approaches customers to use the product for free for a modest time duration.
  • DevOps team & Data Quality specialists will closely work with the solution delivery team to absorb all possible data anomalies in the testing pipeline on CI/CD.

What it means for a Data Quality Team:

quality team has to play a crucial role to make this whole setup a success. During the early development of the product, Data QA should focus on automating all possible test scenarios related to data quality. This automated test suite will get integrated into the CI/CD pipeline.?

During the product implementation, the data quality team should work closely with the product implementation team to make sure that we identify and analyze all anomalies in RWD during history data load. This step is critical because a majority of the issues and anomalies will be identified during the history load.?

Once identified, the data quality team will start implementing all relevant data scenarios in the existing product test suite.?

Everything will make sense in the End:

Apart from creating robust data quality checks for a specific product. Feedback loops serve another significant purpose. It significantly increases understanding and knowledge of the data quality team. In the long term, we get a realistic picture of the real-world data of a particular domain.

Whenever a new data analytics product is getting conceptualized, the data quality team will be ready with superior knowledge of the real-world data they inherited from previous iterations.?

Trust me when I say, it is invaluable and precious to have an understanding of the domain data. One can drive to all nooks and corners of the RWD dimension and come out better than previous iterations.

Many thanks for sharing your valuable insights Adarsh Srivastava.

要查看或添加评论,请登录

Adarsh Srivastava的更多文章

社区洞察

其他会员也浏览了