What is Data Lineage?

What is Data Lineage?

In this post I want to talk about something that sounds a bit daunting but is actually super helpful when it comes to Data Governance, and that is Data Lineage.?

What is Data Lineage??

In its simplest form, Data Lineage can be thought of as a diagram that shows you how data flows through an organisation from the first point that it comes in at.

For example, imagine a customer placing an order on a website. That's where the data journey begins. Then it might travel through various systems like order processing and inventory management, before landing in an organisation’s data warehouse for reporting.?

Now, that is a very straightforward example and of course things can get more complex than that, but the purpose of Data Lineage remains the same - to show what systems and processes your data goes through no matter how simple or complex.?

The benefits and challenges of Data Lineage

Sometimes data takes unexpected routes when it is being moved from system to system, which can lead to hiccups. That's where Data Lineage comes in handy. It can help you spot potential issues and understand how your data is flowing.

Nevertheless, creating Data Lineage diagrams can be challenging at times. There are tools made specifically to help with these challenges. Automated tools can scan your databases and do Data Lineage for you. The problem with this is that they often churn out tons of detailed diagrams that can be overwhelming if this level of detail is not needed.?

My advice? Keep it simple.

Start by focusing on the most important data for your organisation and work backwards. Ask those who use that data where they get it from, then follow the breadcrumbs all the way back. I say this because it's really hard to work forwards when you're trying to create a Data Lineage if it's never been documented before.?

Another thing I'd recommend if you're perhaps not sure where your data starts is to talk to some experienced long standing business analysts in your organisation. They probably have some good ideas about where data is flowing through.?

So, there you have it. Data Lineage isn't scary - it's actually fairly simple to create high level Data Lineage diagrams when you break it all down first.

Prefer this content in video form? Click here to watch the video.

If you found this helpful and would like to know more about Data Governance, feel free to book a call with me.


Originally published on www.nicolaaskham.com

If you enjoyed this article, don't forget to visit my LinkedIn profile and turn on post notifications (just hit the bell!). This way, you'll never miss out on my latest post

I would also say it helps investigate data quality or semantic issues (ie when someone points to a report and says it’s wrong, the error is likely to be in the data that fed other data that fed the report so you need to unpick that) and at a higher level, the privacy team will need to document how personal data is processed under GDPR, so they will need to capture how that data flows through.

回复
Abhishek Anand

Data Management Professional

2 个月

Nicola Askham - couldn't be simplified further! Could relate with the challenges you have described ??

回复
John Platten

Data Architect

2 个月

You managed a presentation at Big Data London and a significant Linked in post straight after? Way to go Nicola. ??

Gaurav Rawal

Curious Learner | PMP Trained | Data Management | Data Governance | PMO | Regulatory Reporting | Capital Market | Certified Business Analyst | BA

2 个月

Insightful ??

要查看或添加评论,请登录

社区洞察

其他会员也浏览了