Why Getting Answers from Untidy Data is So Difficult
Chris Deniziak
Driving Data Innovation | Data Architecture & Engineering Leader | Strategizing for Scalable, Future-Ready Analytics Solutions
Businesses are increasingly turning to data to make better decisions and this trend is not slowing down. However, for data to be truly effective, it needs to be clean and accurate. Untidy data can make it difficult to get answers to even the simplest questions.
There are many reasons why untidy data is so difficult to work with. Some of the most common reasons are data inconsistency, missing data, and duplicate data. Untidy data often contains inconsistent values. A customer's name might be spelled differently in different records, leading to difficulty in identifying and track customers. Untidy data is also often missing values. For example, a customer record might be missing their address or phone number, which can make it difficult to contact customers or deliver products or services. This also can make things difficult when you're working with inconsistent values and trying to tie records together by using other data attributes. Untidy data more than likely will also contain duplicate records. For example, there might be two or more records for the same customer. Using the previous two examples, this can make it difficult to track customer activity and identify trends, as well as again, trying to break down data silos and bring data together.
?There are numerous ways of starting to tackle untidy data. But why do companies choose to ignore untidy data and spend more time trying to work with it than trying to fix it? Some of the challenges are:
领英推荐
?How do we fix untidy data? Great question. As mentioned before, there are many ways to start to clean up your data. First off, try and identify the problems with your data. What is making the data untidy? Is there a pattern to the issues you're uncovering? Next, develop a plan to fix the problems. This may involve deleting duplicate records, correcting inconsistent values, or filling in missing data. Fixing existing records is only part of that plan. If you fix them, but still have new records coming in with issues, you need to trace the issue back to its source. Lastly, test and verify your data. Many will do the development necessary to clean the records, but they don't verify that the rules in place take care of all records. This is a crucial step in starting to develop a path towards data governance.
?Always remember to start small. If you're looking to tackle data quality issues, don't try and fix everything all at once. If you've found numerous issues within your data, try and find which one aligns best to business goals and objectives. This will surely provide the most value to your investment in cleaning up untidy data. If you're looking to tackle data issues, using a crawl, walk, run approach will always be one of your best sidekicks.