The Hitchhiker's Guide to Data Lineage - Part III

The Hitchhiker's Guide to Data Lineage - Part III

Welcome to "The Hitchhiker's Guide to Data Lineage" series! Ever felt like your data is lost in space? Wondering if it's been sucked into a black hole in the Oracle Cluster or floating around aimlessly like it's trying to find Planet Cognos? Don't panic!

In this journey through the data universe, we’ll explore the twists and turns of metadata lineage. Whether you’re a seasoned data astronaut or just learned that ETL isn’t a new energy drink, there’s something here for everyone.

Grab your towel, and let’s dive into today’s adventure!

Part Three: Inner-System Lineage

Welcome back, my fellow data travellers. Today we continue the data journey by examining the inner-system lineage and discussing cake. I know I may have sold some of you on the sausage analogy, but let’s not speak of sausage when we can just as easily speak of cake.

Before we roll up our sleeves and start baking, a word of warning for those of you who haven’t seen Office Space: you may not get the references ahead. If TPS reports and red staplers mean nothing to you, you have homework to complete. In the meanwhile, here's a meme especially for you:

What Is Inner-System Lineage?

Picture this: you’ve been presented with a delicious report... I mean cake. You know the basic ingredients, your data—flour, eggs, sugar, the works—but how did it come to be this report? Instead of thinking about how data moves across systems, like we did last week with cross-system lineage, we’re zooming in within one system. This is where Inner-System Lineage comes in. It's like watching the cake bake in reverse so that you understand exactly how and when every whisk, stir, and fold happens.

In short, inner-system lineage is all about tracking the exact transformations your data undergoes inside a single system, be it an SQL Stored Procedure, ETL process, or BI tool. It’s the nitty-gritty—showing you how raw data turns into something meaningful. Think of it as the recipe that’s hidden behind the final cake. Want to reverse-engineer it? OCTOPAI has your back.

Data Cake Gone Wrong: Enter the Panicked VP

Now, let’s put this into action with a little Office Space twist. Picture it: your VP of Finance (aka Lumbergh) pings you over Slack: "Yeeahhh, I'm gonna need you to go ahead and fix those revenue numbers. They're not adding up, m'kay?" Classic Lumbergh!

Do you:

  • A) Panic faster than data fleeing a corrupted hard drive?
  • B) Dust off your SQL spell books and start praying to the data gods?
  • C) Calmly fire up Octopai and start investigating like the data detective you are?

If you chose C, congratulations! You're ready to solve the Mystery of the Mismatched Metrics. (If you chose A or B, don't worry—we've all been there. Grab a cup of coffee and follow along!)

So, instead of flailing about like Milton looking for his red stapler, you fire up the platform and get to work. Let’s say the numbers in question are tied to the TotalProductCost column in the quarterly sales report. Octopai’s inner-system lineage map shows that this column is being sourced from the DwhFactSales table. A-ha! A clue! And didn’t Milton (poor, cake-deprived Milton) mention in this morning’s meeting that some changes were made to the FactSales table?

Digging into the Data Cake Recipe (AKA SSIS Packages and SQL Transformations)

Thanks to Octopai's magical lineage powers (no, it doesn't involve sacrificing raw data to the ETL gods), you can follow the breadcrumbs—or in this case, the data crumbs—straight to an SSIS package called populateStgFactPurchasing. You're now peeking inside the system's KitchenAid, watching how the ingredients mix together.

The inner-system lineage map reveals that TotalProductCost isn't just some static value someone typed after their third espresso. It's a carefully (we hope) calculated sum of three columns: OrderQuantity, UnitPrice, and UnitPriceDiscount.


Wait a parsec—Unit Price Discount? Didn’t Lumbergh say we’re not giving discounts anymore? Didn’t you get the memo?

The report is still factoring in those outdated discounts, which is why the revenue numbers are about as reliable as a “Jump to Conclusions” mat. So, how do you fix it?


Component Scripts: The Recipe’s Secret Ingredient

Imagine trying to bake a cake but skipping all the steps in the recipe. You just toss everything into the bowl and hope for the best. Spoiler alert: you're not getting cake; you're getting a disaster that would make even the most lenient data quality checker weep.

Octopai saves you from this culinary catastrophe by breaking down component scripts—the detailed methods that show how data transformations happen, step by step. In our case, it reveals the secret formula for TotalProductCost:

[OrderQty] * UnitPrice * (1 - [UnitPriceDiscount]) as TotalProductCost        


And there it is, sneakier than a semicolon in a thousand-line SQL query—the discount is still lurking in the formula! But armed with this information, you can update the transformation, remove the discount faster than you can say "data integrity," and get those numbers back on track.

The Data Hero's Journey (Now with 100% More Cake)

Now that you've identified and fixed the issue in the SSIS package, it's time for the final step: making sure this change doesn't create a data butterfly effect that turns all your other reports into digital modern art.

Luckily, Octopai's got your back again. You can use cross-system lineage (remember last week's thrilling episode?) to check if this update affects other reports or tables. It's like having a data crystal ball but with better error handling.

And just like that, you've saved the day—without needing to work on Saturday or, Cthulhu forbid, create a new TPS report.

The Icing on the Data Cake

Inner-system lineage is your all-access pass to the secret life of your data. It's like being the fly on the wall in your data's most intimate transformations, but less creepy and far more useful. Whether it's SQL queries doing the cha-cha, SSIS packages playing data Tetris, or BI tools painting by numbers, you can trace your data's journey from raw ingredients to gourmet insights.

The best part? You don't have to play the role of Milton, hopelessly searching for your metaphorical red stapler (or in this case, the root cause of your data discrepancies). With Octopai, you can investigate, trace, and fix issues faster than you can say "I believe you have my stapler."

So go ahead, data chefs. Fix those revenue reports, bake that perfect data cake, and walk away like the lineage legend you are. Just remember to save a slice for Milton, lest we have other fires we need to extinguish.

Curious how your data transforms behind the scenes? Want to be the Sherlock Holmes of your data domain? Octopai makes Inner-System Lineage elementary, my dear data scientist. Let's chat about how we can help you become the master chef of your data kitchen!

Next time, we'll tie this all together by exploring End-to-End column lineage, the product of Cross-System Lineage and Inner-System Lineage having a baby.

Gal Ziton

CDO & Co-Founder

1 个月

Thanks Adam Segal great article ??

要查看或添加评论,请登录

Adam Segal的更多文章

社区洞察

其他会员也浏览了