Using Data Catalog And Data Lineage As a Saviour

Using Data Catalog And Data Lineage As a Saviour

Being A Product Manager your role will be aligned towards a particular?Product Line (PL) and Sub Product Line (SPL)?of your organisation. For me it's Data Analytics and Business Intelligence. So based on my experience so far, in this article I will be talking about a couple of tools that are going to make your job easy if you are someone who works with data regularly.

With the exponential increase in volume and variety of data, coming from inside and outside of the?organisation the?data landscapes?are becoming complex day by day. Also since many organisations are moving into cloud based infrastructure which is driving many applications to be deployed as services that datas are getting more fragmented. And as the businesses are becoming data driven most of the business users are requesting for an easy way to consume data for their business needs. So in order to turn this data into valuable assets and simultaneously bring value to the operational and analytical system of the organisation as well to the consumers we need to create a?way to categorize and classify all the data automatically at scale. In addition to that it is also necessary to be aware about the life cycle of the data and the transformations that it undergoes from the source to the database. That is when the concept of?Data catalog?and?Data Lineage?comes into picture as they help you to track data easily in terms of schema, views and tables.?

So before we take a deep dive into the usefulness of data catalog and data lineage I will take a minute to discuss what they actually are on a high level. A Data Catalog is a collection of?metadata, combined with?data management and search tools, that helps the data users to find the data that they need, and simultaneously serves as an inventory of available data, and provides information to evaluate the data for intended uses. Whereas Data lineage presents the genesis of a dataset, how it adapts and evolves on its journey.?It describes a certain dataset’s origin, movement, characteristics and quality.

The Data Catalog and Data Lineage is helpful to an organisation in the following ways :

  • Self Service Discover for Analytics: For analytics a catalog can promote self service by helping the users to find the data required for their analysis.
  • Data Governance:?For Governance a catalog can provide that ground truth and it reflects the presence, use and quality of the physical data in the data landscape in a way that is understandable to the business users.
  • IT Operation Analysis: For?IT Operations a catalog can show all data dependencies and help IT users to understand the impact of any changes that they are planning
  • Data catalog?uses machine learning to automate and simplify the collection and classification of metadata at a scale.
  • Talking about?Data lineage, tracking how data progresses as it interacts with other sets creates a lifecycle of information. This can then be assessed to implement effective changes to business operations.
  • ?For the purposes of?data integration?specifically, data lineage provides a look at how data is manipulated via the?ETL (extract, transform, load)?process so that data quality assessments can be made before data is loaded into an analytics tool
  • ?Data lineage?also helps to understand the impact of data changes on downstream analytics and applications, understand the risk of change to business processes, and take a more proactive approach to change management.

I guess after going through this short article you will certainly have some key takeaways.

Please do not forget to show you love by giving a like.

Would also like to have your valuable feedbacks through comments.

Regards

Subhadeep Pal

?


Nikhil Kansari

Principal AI Product Manager | PSPO?| Gen AI | LLMs | Prompt Engineering | Data Analytics | Cloud (SaaS, CPaaS, PaaS) | Growth Hacks | 0-1 & scaling 1-N | Intrapreneur | Mentor

3 年

Focus a little more on Data catalog ontologies.. they're data frameworks for representing shareable and reusable knowledge accross product and domains. It will help you to build effective models for your sub product line..

要查看或添加评论,请登录

SUBHADEEP PAL的更多文章

社区洞察