Episode 6: Part 2 - Documenting Data Lineage

Episode 6: Part 2 - Documenting Data Lineage

The workshop took longer than expected and we had to continue the following day.

Chike introduced me to the Monthly Operational Performance Report, an important dashboard used by senior management to monitor key metrics like transaction volume, branch efficiency, and customer satisfaction.

“Our job,” Chike said, “is to document the lineage of this report. We need to show how the data flows from its source systems, through processing steps, to its final presentation in the dashboard.”


Step 1: Identifying Data Sources

We began by reviewing the data sources feeding into the report:

  • Customer Database: Contains customer profiles and branch assignments.
  • Transaction Logs: Tracks daily financial transactions at each branch.
  • Employee Records system: Holds branch performance metrics.

Each source was linked to metadata that described its structure, owner, and update frequency.




Step 2: Mapping Data Transformations

Next, we looked at how the data was processed before reaching the report:

  • Data Cleansing: Duplicate records were removed, and missing values were handled.
  • Aggregation: Transaction data was grouped by branch and region for easier analysis.
  • Validation: Each dataset was cross-checked by the operations team for accuracy.

Chike demonstrated how to use the data catalog tool to map these steps.

“Each transformation adds context to the lineage,” he explained. “If there’s an issue in the report, this map helps us trace it back to the source and fix it.” He added these details to the data catalog, noting each step in the process and linking it to the relevant metadata.


Step 3: Documenting Metadata

With the lineage mapped, we documented metadata for each stage:

Descriptive Metadata

  • Title: Monthly Operational Performance Report.
  • Summary: A dashboard summarizing branch performance metrics for senior management.

Provenance Metadata

  • Data Origin: Extracted from customer and transaction systems.
  • Processing Steps: Aggregated and validated before being visualized in Power BI.

Technical Metadata

  • Format: Source data stored in CSV files; final report in a Power BI dashboard.
  • Frequency: Updated weekly.

Administrative Metadata

  • Owner: Operations Team.
  • Retention Policy: Stored for 12 months, then archived.


Step 4: Validating the Metadata

Once we’d mapped the lineage, Chike showed me how to validate the metadata:

  • Accuracy: Cross-checking metadata against source system documentation.
  • Completeness: Ensuring every data element had descriptive, structural, and administrative metadata.
  • Consistency: Making sure naming conventions matched the bank’s standards.

“This is where metadata governance comes in,” Chike said. “Good metadata needs to be understandable and usable by everyone.”


Step 5: Validating the Lineage

After completing the documentation, we reviewed the lineage with key stakeholders:

  • Operations Team: Confirmed the accuracy of the aggregation and validation steps.
  • IT Team: Verified technical details, such as data formats and update schedules.
  • Compliance Team: Ensured the metadata aligned with regulatory standards like NDPR.

Their feedback helped refine the lineage and fill in missing details.


As we wrapped up, Chike asked me what I’d learned from the task.

“Data lineage is like telling a story,” I said. “It shows where the data started, what happened to it, and how it ended up in the report. Without this, we’d be working blind.”

Chike nodded. “Exactly. Metadata—and lineage in particular—is what makes our data governance work. It gives us traceability, accountability, and trust.”

That evening, I felt a sense of accomplishment. This task has shown me metadata’s power to bring clarity and order to the rather complex world of data governance.

Jethro Oloruntobi O.

Data Governance Analyst at Canopius INSURANCE

3 周

Insightful

回复

要查看或添加评论,请登录

Oyinlola Oresanya的更多文章

  • Episode 10: A Month of Growth

    Episode 10: A Month of Growth

    The evening sun cast long shadows across my desk as I stayed late one Friday, not because of pending work, but because…

    7 条评论
  • The Myth of Perfect Data Governance: Why Good Enough Is Enough

    The Myth of Perfect Data Governance: Why Good Enough Is Enough

    In the quest for data excellence, organizations often chase the illusion of perfect data governance, an unblemished…

  • Design a Personalized Growth Plan

    Design a Personalized Growth Plan

    We often think of personal development as a vague, feel-good concept—something we do when we have the time or when…

  • Episode 9: Balancing Compliance and Innovation

    Episode 9: Balancing Compliance and Innovation

    The morning sun streamed through the windows of our meeting room on the 8th floor, where I sat with my notebook open…

    4 条评论
  • Episode 8: The First Presentation

    Episode 8: The First Presentation

    Presenting in front of a group had always been a nerve-wracking thought for me. I wasn’t shy, but the idea of standing…

  • Episode 7: Tackling Data Quality Issues

    Episode 7: Tackling Data Quality Issues

    It had been a little over a month since I joined the Data Governance Office, and while I had learned a lot, I was…

    3 条评论
  • Episode 6: What is Metadata?

    Episode 6: What is Metadata?

    “Metadata.” It was one of those terms I kept hearing but didn’t fully understand.

    10 条评论
  • Episode 5: Finding My Feet

    Episode 5: Finding My Feet

    By my fourth week in the Data Governance Office, I felt like I was finally starting to make progress. I’d shadowed…

  • Ada wishes you a Happy New Year!

    Ada wishes you a Happy New Year!

    Every fresh start brings opportunities to learn, grow, and make a meaningful impact. And what a time to be reminded of…

    1 条评论
  • Episode 4: Building the Skillset

    Episode 4: Building the Skillset

    By my third week in the Data Governance Office, I’d started to see the bigger picture. The shadowing sessions with my…

    7 条评论

社区洞察

其他会员也浏览了