How? - Considerations for Organizing Data
Now that we have a fairly good idea of how to think about data, let's talk about the How. As we discussed previously, data can be sourced from multiple data sources at different frequencies and different volumes.? So, as a Data Practitioner (DP), your job is to organize, manage and deliver data to the point of decision making. Note that I am saying Data Practitioner and not? Data Engineer, Data Analyst, Data Scientist or Data Steward. There could be other sub classifications of this, but if you are in the data domain, it is important to have a complete understanding of data across the organization, even though you might specialize in one of the sub-domains.
A few things need to be thought through, to get all the data prepped and a ready for consumption by the rest of the organization. Let's take the example of GroomGlow that we have been discussing. While there are many ways to acquire data, we will just consider a few typical scenarios.
Data Management & Organization:
?? ?We can throw the following terms under this category. Data Classification, Data Standardization, Metadata Management, Master Data Management, Data Governance etc. We will go through each one of these in detail during this series. Basically, once the data is collected, it is imperative that we classify, standardize and govern the data on a daily basis. I say, daily basis, because, you will find that data is very hard to govern across the organization. Collection and enforcement of this falls upon Data Practitioners like us. So far, I have stayed away from discussing tools and technologies. But, in the diagram above, I have mentioned some of the tools and technologies that you might come across, in your data journey. Products like?ZScore's Smart Data Platform?helps in this quite a bit, using AI/ML technologies as well.
领英推荐
Data Auditability
?? ?This should ideally fall under data management as well, but I am calling this out as a separate section to pay attention to, because of legal and security reasons. Data Traceability, Lineage, Observability all of these terms will fall under this category. As much as you would like to think that you have secured your data, it is a very valuable commodity these days and the threat of theft is always there. Now, you may ask, most of the threats can be taken care of by good firewalls and cyber security practices, isn't it? Not really. Almost 83% of data breaches happen with collusion from internal actors. So, it is important to ensure that every touch point on the data that is under your control is logged and kept. Who touched it, what changed, when did it change and pretty much every thing that you can think of, should be tracked and traced.?
Data Quality
? ??This is a subject on its own, so I will elaborate on this in a different post. For now, remember that if you don't get this right, everything else falls apart.
Data Storage
? ??This is the reason why we spoke about?volume of data, frequency of data, derived data, granularity?etc. Data Storage costs can skyrocket if you don't manage this with a vision. Your data architecture should take this into consideration seriously. Remember, part of our mandate is to manage costs.
I'll stop this one here. Hope you understand why I took so long to talk about tools and technologies. We will talk about Data Quality next and then we will go on to Delivering data at the point of decision making.